Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
We are using Proxmox with DRBD in dual primary using IPoIB for transport Recently tested Proxmox upcoming 3.10 kernel based on the kernel from RHEL 7 and started having problems with DRBD. The kernel came with DRBD 8.4.3, I have also compiled and installed 8.4.5 and both experience the same problem. During times of heavy IO loads (backups) DRBD will timeout and split brain, I have included some logs below. I stumbled on a couple LKML threads that discusses a deadlock with IPoIB and IO that happens over the IPoIB such as iSCSI or NFS. https://lkml.org/lkml/2014/2/21/655 http://lkml.org/lkml/2014/4/24/543 Is it likely that DRBD could also trigger the deadlock discussed on LKML? If not, do you have any other suggestions on how I can prevent this timeout? Node A: Jan 5 03:23:51 vm6 kernel: [2221944.335766] drbd drbd0: peer( Primary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown ) Jan 5 03:23:51 vm6 kernel: [2221944.335782] drbd drbd0: asender terminated Jan 5 03:23:51 vm6 kernel: [2221944.335784] drbd drbd0: Terminating drbd_a_drbd0 Jan 5 03:23:51 vm6 kernel: [2221944.335846] block drbd0: new current UUID BD9DB97EC672F5C9:8F2DD469C771058B:925C07CF6316212D:925B07CF6316212D Jan 5 03:23:51 vm6 kernel: [2221944.347788] drbd drbd0: Connection closed Jan 5 03:23:51 vm6 kernel: [2221944.347834] drbd drbd0: conn( Timeout -> Unconnected ) Jan 5 03:23:51 vm6 kernel: [2221944.347836] drbd drbd0: receiver terminated Node B: Jan 5 03:23:51 vm5 kernel: [2223090.170391] drbd drbd0: sock was shut down by peer Jan 5 03:23:51 vm5 kernel: [2223090.170409] drbd drbd0: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) Jan 5 03:23:51 vm5 kernel: [2223090.170412] drbd drbd0: short read (expected size 16) Jan 5 03:23:51 vm5 kernel: [2223090.170421] drbd drbd0: asender terminated Jan 5 03:23:51 vm5 kernel: [2223090.170423] drbd drbd0: Terminating drbd_a_drbd0 Jan 5 03:23:51 vm5 kernel: [2223090.170480] block drbd0: new current UUID 2628F73F9DAE5EDF:8F2DD469C771058B:925C07CF6316212D:925B07CF6316212D Jan 5 03:23:51 vm5 kernel: [2223090.185536] drbd drbd0: Connection closed Jan 5 03:23:51 vm5 kernel: [2223090.185585] drbd drbd0: conn( BrokenPipe -> Unconnected ) Jan 5 03:23:51 vm5 kernel: [2223090.185587] drbd drbd0: receiver terminated Eric