[DRBD-user] Error using SDP in DRBD8.4

Nuno Fernandes npf-mlists at eurotux.com
Sat Oct 17 12:41:22 CEST 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

We had drbd 8.3 replicating using mellanox infiniband cards using SDP. It worked fine.
After upgrading to 8.4 the SDP replication doesn't work. Using plain IP over Infiniband works but using sdp i get the following logs in the "secondary host":

Oct 17 11:36:27 s2 -bash: (4415) [root.root] |.| /etc/init.d/drbd start
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up                                                                                                       
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up                                                                                                       
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up                                                                                                       
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: events: mcg drbd: 2
Oct 17 11:36:27 s2 kernel: drbd: initialized. Version: 8.4.6 (api:1/proto:86-101)
Oct 17 11:36:27 s2 kernel: drbd: GIT-hash: 833d830e0152d1e457fa7856e71e11248ccf3f70 build by phil at Build64R6, 2015-04-09 14:35:00
Oct 17 11:36:27 s2 kernel: drbd: registered as block device major 147
Oct 17 11:36:27 s2 kernel: drbd infiniband: Starting worker thread (from drbdsetup-84 [21673])
Oct 17 11:36:27 s2 kernel: block drbd0: disk( Diskless -> Attaching ) 
Oct 17 11:36:27 s2 kernel: drbd infiniband: Method to ensure write ordering: drain
Oct 17 11:36:27 s2 kernel: block drbd0: max BIO size = 1048576
Oct 17 11:36:27 s2 kernel: block drbd0: drbd_bm_resize called with capacity == 2929267928
Oct 17 11:36:27 s2 multipathd: drbd0: add path (uevent)
Oct 17 11:36:27 s2 multipathd: drbd0: failed to get path uid
Oct 17 11:36:27 s2 multipathd: uevent trigger error
Oct 17 11:36:27 s2 kernel: block drbd0: resync bitmap: bits=366158491 words=5721227 pages=11175
Oct 17 11:36:27 s2 kernel: block drbd0: size = 1397 GB (1464633964 KB)
Oct 17 11:36:28 s2 kernel: block drbd0: recounting of set bits took additional 41 jiffies
Oct 17 11:36:28 s2 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
Oct 17 11:36:28 s2 kernel: block drbd0: disk( Attaching -> UpToDate ) 
Oct 17 11:36:28 s2 kernel: block drbd0: attached to UUIDs 8C14BF163C91396E:9322D5CEC266CFFB:02BBF663E04A58FD:02BAF663E04A58FC
Oct 17 11:36:28 s2 kernel: drbd infiniband: conn( StandAlone -> Unconnected ) 
Oct 17 11:36:28 s2 kernel: drbd infiniband: Starting receiver thread (from drbd_w_infiniba [21676])
Oct 17 11:36:28 s2 kernel: drbd infiniband: receiver (re)started
Oct 17 11:36:28 s2 kernel: drbd infiniband: conn( Unconnected -> WFConnection ) 
Oct 17 11:36:39 s2 kernel: drbd infiniband: sock_recvmsg returned -11
Oct 17 11:36:39 s2 kernel: drbd infiniband: conn( WFConnection -> BrokenPipe ) 
Oct 17 11:36:39 s2 kernel: drbd infiniband: short read (expected size 8)
Oct 17 11:36:39 s2 kernel: drbd infiniband: Connection closed
Oct 17 11:36:39 s2 kernel: drbd infiniband: conn( BrokenPipe -> Unconnected ) 
Oct 17 11:36:40 s2 kernel: drbd infiniband: conn( Unconnected -> WFConnection ) 

It remains here....

CTRL+C

Oct 17 11:36:54 s2 -bash: (4415) [root.root] |.| /etc/init.d/drbd stop
Oct 17 11:36:54 s2 kernel: drbd infiniband: conn( WFConnection -> Disconnecting ) 
Oct 17 11:36:54 s2 kernel: drbd infiniband: Discarding network configuration.
Oct 17 11:36:54 s2 kernel: drbd infiniband: Connection closed
Oct 17 11:36:54 s2 kernel: drbd infiniband: conn( Disconnecting -> StandAlone ) 
Oct 17 11:36:54 s2 kernel: drbd infiniband: receiver terminated
Oct 17 11:36:54 s2 kernel: drbd infiniband: Terminating drbd_r_infiniba
Oct 17 11:36:54 s2 kernel: block drbd0: disk( UpToDate -> Failed ) 
Oct 17 11:36:54 s2 kernel: block drbd0: bitmap WRITE of 0 pages took 0 jiffies
Oct 17 11:36:54 s2 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
Oct 17 11:36:54 s2 kernel: block drbd0: disk( Failed -> Diskless ) 
Oct 17 11:36:54 s2 multipathd: drbd0: remove path (uevent)
Oct 17 11:36:54 s2 kernel: drbd infiniband: Terminating drbd_w_infiniba
Oct 17 11:36:54 s2 kernel: drbd: module cleanup done.


I'm using latest centos 6 with 2.6.32-573.7.1.el6.x86_64 kernel and drbd84 from elrepo.
Here is the conf of the resourse:

resource infiniband {
  device /dev/drbd0;
  meta-disk internal;
  handlers {
    split-brain "/usr/lib/drbd/notify-split-brain.sh root";
  }
  startup {
    become-primary-on both;
  }
  net {
    allow-two-primaries;
    after-sb-0pri discard-zero-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;
  }
  syncer {
    rate 1500M;
  }
  disk {
    # raid controller with battery back
    disk-flushes no;
  }
  on s1 {
    disk /dev/sdb;
    #address 192.168.11.1:7789;
    address sdp 192.168.11.1:7789;
  }
  on s2 {
    disk /dev/sdb;
    address sdp 192.168.11.2:7789;
  }
}

Aditional info:
[root at s1 drbd.d]# lspci|grep -i mell
03:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
 modinfo ib_sdp
filename:       /lib/modules/2.6.32-573.7.1.el6.x86_64/weak-updates/mlnx-ofa_kernel/drivers/infiniband/ulp/sdp/ib_sdp.ko
license:        Dual BSD/GPL
description:    InfiniBand SDP module
author:         Michael S. Tsirkin
srcversion:     D046FDB330053923ED58690

Thanks for any help..
Best regards,
Nuno Fernandes



More information about the drbd-user mailing list