Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello Lars =) Reply inline: On Fri, Sep 17, 2010 at 4:22 AM, Lars Ellenberg <lars.ellenberg at linbit.com>wrote: > > -EINVAL > > iirc, it is a bug inside the in-kernel SDP connect() peer lookup, > which EINVALs if the target address is not given as AF_INET (!), > even if the socket itself is AF_INET_SDP. > Or the other way around. > I see, great info. > > If you do "drbdadm -d connect $resource", you get the drbdsetup > command that would have been issued. > replace the second (remmote) sdp with ipv4, > and do them manually, on both nodes. > If that does not work, replace only the first (local) sdp with ipv4, > but keep the second (remote) sdp. > The first suggestion seems to work: [root at node01 ~]# drbdsetup 0 net sdp:192.168.20.1:7778 ipv4: 192.168.20.2:7778 C --set-defaults --create-device --max-epoch-size=20000 --max-buffers=20000 --after-sb-2pri=disconnect --after-sb-1pri=discard-secondary --after-sb-0pri=discard-zero-changes --allow-two-primaries [root at node02 ~]# drbdsetup 0 net sdp:192.168.20.2:7778 ipv4: 192.168.20.1:7778 C --set-defaults --create-device --max-epoch-size=20000 --max-buffers=20000 --after-sb-2pri=disconnect --after-sb-1pri=discard-secondary --after-sb-0pri=discard-zero-changes --allow-two-primaries > > If that gets you connected, then its that bug. > I think I even patched it in kernel once, > but don't find that right now, > and don't remember the SDP version either. > I think it was > drivers/infiniband/ulp/sdp/sdp_main.c:addr_resolve_remote() > missing an (... || ... = AF_INET_SDP) > Hmmm. It may be a MLNX_OFED specific bug? If we can get the code, I have a support path with Mellanox, I can probably get this pushed into their upstream OFED. I'll look at it and see if I can figure it out. > > That's all userland, and does not affect DRBD, as DRBD does all > networking from within the kernel. > Ahhh, right. > Share your findings on DRBD performance IPoIB vs. SDP, > once you get the thing to work on your platform. Well, using the drbdsetup method above, SDP is significantly slower than IPoIB. Below are some write results, the HAStorage VG is using drbd0 as a PV: With IPoIB: # time dd if=/dev/zero of=/dev/HAStorage/test bs=4096M count=5 0+5 records in 0+5 records out 10737397760 bytes (11 GB) copied, 17.8384 seconds, 602 MB/s real 0m17.864s user 0m0.000s sys 0m10.292s With SDP and the drbdsetup sdp/ipv4: # time dd if=/dev/zero of=/dev/HAStorage/test bs=4096M count=5 0+5 records in 0+5 records out 10737397760 bytes (11 GB) copied, 26.2015 seconds, 410 MB/s real 0m26.220s user 0m0.001s sys 0m11.891s The underlying storage is a 16-disk RAID10 with 1.5GB of flash-backed write cache. Local writes to storage happen around 900MB/s sustained, and burst at several GB/sec to write cache. I suspect a proper fix for the in-kernel SDP connect() may fix SDP performance here. At least I would hope so! Thanks for the information, very helpful, -JR -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100917/9ec035b1/attachment.htm>