[DRBD-user] Possible IPoIB deadlock with DRBD

Eric Blevins ericlb100 at gmail.com
Thu Jan 15 17:53:48 CET 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


We are using Proxmox with DRBD in dual primary using IPoIB for transport
Recently tested Proxmox upcoming 3.10 kernel based on the kernel from
RHEL 7 and started having problems with DRBD.

The kernel came with DRBD 8.4.3, I have also compiled and installed
8.4.5 and both experience the same problem.

During times of heavy IO loads (backups) DRBD will timeout and split
brain, I have included some logs below.
I stumbled on a couple LKML threads that discusses a deadlock with
IPoIB and IO that happens over the IPoIB such as iSCSI or NFS.
https://lkml.org/lkml/2014/2/21/655
http://lkml.org/lkml/2014/4/24/543

Is it likely that DRBD could also trigger the deadlock discussed on LKML?
If not, do you have any other suggestions on how I can prevent this timeout?


Node A:
Jan  5 03:23:51 vm6 kernel: [2221944.335766] drbd drbd0: peer( Primary
-> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown )
Jan  5 03:23:51 vm6 kernel: [2221944.335782] drbd drbd0: asender terminated
Jan  5 03:23:51 vm6 kernel: [2221944.335784] drbd drbd0: Terminating
drbd_a_drbd0
Jan  5 03:23:51 vm6 kernel: [2221944.335846] block drbd0: new current
UUID BD9DB97EC672F5C9:8F2DD469C771058B:925C07CF6316212D:925B07CF6316212D
Jan  5 03:23:51 vm6 kernel: [2221944.347788] drbd drbd0: Connection closed
Jan  5 03:23:51 vm6 kernel: [2221944.347834] drbd drbd0: conn( Timeout
-> Unconnected )
Jan  5 03:23:51 vm6 kernel: [2221944.347836] drbd drbd0: receiver terminated


Node B:
Jan  5 03:23:51 vm5 kernel: [2223090.170391] drbd drbd0: sock was shut
down by peer
Jan  5 03:23:51 vm5 kernel: [2223090.170409] drbd drbd0: peer( Primary
-> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate ->
DUnknown )
Jan  5 03:23:51 vm5 kernel: [2223090.170412] drbd drbd0: short read
(expected size 16)
Jan  5 03:23:51 vm5 kernel: [2223090.170421] drbd drbd0: asender terminated
Jan  5 03:23:51 vm5 kernel: [2223090.170423] drbd drbd0: Terminating
drbd_a_drbd0
Jan  5 03:23:51 vm5 kernel: [2223090.170480] block drbd0: new current
UUID 2628F73F9DAE5EDF:8F2DD469C771058B:925C07CF6316212D:925B07CF6316212D
Jan  5 03:23:51 vm5 kernel: [2223090.185536] drbd drbd0: Connection closed
Jan  5 03:23:51 vm5 kernel: [2223090.185585] drbd drbd0: conn(
BrokenPipe -> Unconnected )
Jan  5 03:23:51 vm5 kernel: [2223090.185587] drbd drbd0: receiver terminated

Eric



More information about the drbd-user mailing list