Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, We had drbd 8.3 replicating using mellanox infiniband cards using SDP. It worked fine. After upgrading to 8.4 the SDP replication doesn't work. Using plain IP over Infiniband works but using sdp i get the following logs in the "secondary host": Oct 17 11:36:27 s2 -bash: (4415) [root.root] |.| /etc/init.d/drbd start Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up Oct 17 11:36:27 s2 kernel: drbd: events: mcg drbd: 2 Oct 17 11:36:27 s2 kernel: drbd: initialized. Version: 8.4.6 (api:1/proto:86-101) Oct 17 11:36:27 s2 kernel: drbd: GIT-hash: 833d830e0152d1e457fa7856e71e11248ccf3f70 build by phil at Build64R6, 2015-04-09 14:35:00 Oct 17 11:36:27 s2 kernel: drbd: registered as block device major 147 Oct 17 11:36:27 s2 kernel: drbd infiniband: Starting worker thread (from drbdsetup-84 [21673]) Oct 17 11:36:27 s2 kernel: block drbd0: disk( Diskless -> Attaching ) Oct 17 11:36:27 s2 kernel: drbd infiniband: Method to ensure write ordering: drain Oct 17 11:36:27 s2 kernel: block drbd0: max BIO size = 1048576 Oct 17 11:36:27 s2 kernel: block drbd0: drbd_bm_resize called with capacity == 2929267928 Oct 17 11:36:27 s2 multipathd: drbd0: add path (uevent) Oct 17 11:36:27 s2 multipathd: drbd0: failed to get path uid Oct 17 11:36:27 s2 multipathd: uevent trigger error Oct 17 11:36:27 s2 kernel: block drbd0: resync bitmap: bits=366158491 words=5721227 pages=11175 Oct 17 11:36:27 s2 kernel: block drbd0: size = 1397 GB (1464633964 KB) Oct 17 11:36:28 s2 kernel: block drbd0: recounting of set bits took additional 41 jiffies Oct 17 11:36:28 s2 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Oct 17 11:36:28 s2 kernel: block drbd0: disk( Attaching -> UpToDate ) Oct 17 11:36:28 s2 kernel: block drbd0: attached to UUIDs 8C14BF163C91396E:9322D5CEC266CFFB:02BBF663E04A58FD:02BAF663E04A58FC Oct 17 11:36:28 s2 kernel: drbd infiniband: conn( StandAlone -> Unconnected ) Oct 17 11:36:28 s2 kernel: drbd infiniband: Starting receiver thread (from drbd_w_infiniba [21676]) Oct 17 11:36:28 s2 kernel: drbd infiniband: receiver (re)started Oct 17 11:36:28 s2 kernel: drbd infiniband: conn( Unconnected -> WFConnection ) Oct 17 11:36:39 s2 kernel: drbd infiniband: sock_recvmsg returned -11 Oct 17 11:36:39 s2 kernel: drbd infiniband: conn( WFConnection -> BrokenPipe ) Oct 17 11:36:39 s2 kernel: drbd infiniband: short read (expected size 8) Oct 17 11:36:39 s2 kernel: drbd infiniband: Connection closed Oct 17 11:36:39 s2 kernel: drbd infiniband: conn( BrokenPipe -> Unconnected ) Oct 17 11:36:40 s2 kernel: drbd infiniband: conn( Unconnected -> WFConnection ) It remains here.... CTRL+C Oct 17 11:36:54 s2 -bash: (4415) [root.root] |.| /etc/init.d/drbd stop Oct 17 11:36:54 s2 kernel: drbd infiniband: conn( WFConnection -> Disconnecting ) Oct 17 11:36:54 s2 kernel: drbd infiniband: Discarding network configuration. Oct 17 11:36:54 s2 kernel: drbd infiniband: Connection closed Oct 17 11:36:54 s2 kernel: drbd infiniband: conn( Disconnecting -> StandAlone ) Oct 17 11:36:54 s2 kernel: drbd infiniband: receiver terminated Oct 17 11:36:54 s2 kernel: drbd infiniband: Terminating drbd_r_infiniba Oct 17 11:36:54 s2 kernel: block drbd0: disk( UpToDate -> Failed ) Oct 17 11:36:54 s2 kernel: block drbd0: bitmap WRITE of 0 pages took 0 jiffies Oct 17 11:36:54 s2 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Oct 17 11:36:54 s2 kernel: block drbd0: disk( Failed -> Diskless ) Oct 17 11:36:54 s2 multipathd: drbd0: remove path (uevent) Oct 17 11:36:54 s2 kernel: drbd infiniband: Terminating drbd_w_infiniba Oct 17 11:36:54 s2 kernel: drbd: module cleanup done. I'm using latest centos 6 with 2.6.32-573.7.1.el6.x86_64 kernel and drbd84 from elrepo. Here is the conf of the resourse: resource infiniband { device /dev/drbd0; meta-disk internal; handlers { split-brain "/usr/lib/drbd/notify-split-brain.sh root"; } startup { become-primary-on both; } net { allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } syncer { rate 1500M; } disk { # raid controller with battery back disk-flushes no; } on s1 { disk /dev/sdb; #address 192.168.11.1:7789; address sdp 192.168.11.1:7789; } on s2 { disk /dev/sdb; address sdp 192.168.11.2:7789; } } Aditional info: [root at s1 drbd.d]# lspci|grep -i mell 03:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] modinfo ib_sdp filename: /lib/modules/2.6.32-573.7.1.el6.x86_64/weak-updates/mlnx-ofa_kernel/drivers/infiniband/ulp/sdp/ib_sdp.ko license: Dual BSD/GPL description: InfiniBand SDP module author: Michael S. Tsirkin srcversion: D046FDB330053923ED58690 Thanks for any help.. Best regards, Nuno Fernandes