[DRBD-user] Possible IPoIB deadlock with DRBD

Matteo Tescione matteo at RMnet.it
Fri Jan 16 11:32:22 CET 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Eric, 

it seems that I'm hitting the same deadlock, but I don't use dual primary, and the split brain never occurs.

Can you post your drbd config as long with the infiniband hba model and version you're using?

regards,

--
matteo

----- Messaggio originale -----
> Da: "Eric Blevins" <ericlb100 at gmail.com>
> A: drbd-user at lists.linbit.com
> Inviato: Giovedì, 15 gennaio 2015 17:53:48
> Oggetto: [DRBD-user] Possible IPoIB deadlock with DRBD
> 
> We are using Proxmox with DRBD in dual primary using IPoIB for
> transport
> Recently tested Proxmox upcoming 3.10 kernel based on the kernel from
> RHEL 7 and started having problems with DRBD.
> 
> The kernel came with DRBD 8.4.3, I have also compiled and installed
> 8.4.5 and both experience the same problem.
> 
> During times of heavy IO loads (backups) DRBD will timeout and split
> brain, I have included some logs below.
> I stumbled on a couple LKML threads that discusses a deadlock with
> IPoIB and IO that happens over the IPoIB such as iSCSI or NFS.
> https://lkml.org/lkml/2014/2/21/655
> http://lkml.org/lkml/2014/4/24/543
> 
> Is it likely that DRBD could also trigger the deadlock discussed on
> LKML?
> If not, do you have any other suggestions on how I can prevent this
> timeout?
> 
> 
> Node A:
> Jan  5 03:23:51 vm6 kernel: [2221944.335766] drbd drbd0: peer(
> Primary
> -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown
> )
> Jan  5 03:23:51 vm6 kernel: [2221944.335782] drbd drbd0: asender
> terminated
> Jan  5 03:23:51 vm6 kernel: [2221944.335784] drbd drbd0: Terminating
> drbd_a_drbd0
> Jan  5 03:23:51 vm6 kernel: [2221944.335846] block drbd0: new current
> UUID
> BD9DB97EC672F5C9:8F2DD469C771058B:925C07CF6316212D:925B07CF6316212D
> Jan  5 03:23:51 vm6 kernel: [2221944.347788] drbd drbd0: Connection
> closed
> Jan  5 03:23:51 vm6 kernel: [2221944.347834] drbd drbd0: conn(
> Timeout
> -> Unconnected )
> Jan  5 03:23:51 vm6 kernel: [2221944.347836] drbd drbd0: receiver
> terminated
> 
> 
> Node B:
> Jan  5 03:23:51 vm5 kernel: [2223090.170391] drbd drbd0: sock was
> shut
> down by peer
> Jan  5 03:23:51 vm5 kernel: [2223090.170409] drbd drbd0: peer(
> Primary
> -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate ->
> DUnknown )
> Jan  5 03:23:51 vm5 kernel: [2223090.170412] drbd drbd0: short read
> (expected size 16)
> Jan  5 03:23:51 vm5 kernel: [2223090.170421] drbd drbd0: asender
> terminated
> Jan  5 03:23:51 vm5 kernel: [2223090.170423] drbd drbd0: Terminating
> drbd_a_drbd0
> Jan  5 03:23:51 vm5 kernel: [2223090.170480] block drbd0: new current
> UUID
> 2628F73F9DAE5EDF:8F2DD469C771058B:925C07CF6316212D:925B07CF6316212D
> Jan  5 03:23:51 vm5 kernel: [2223090.185536] drbd drbd0: Connection
> closed
> Jan  5 03:23:51 vm5 kernel: [2223090.185585] drbd drbd0: conn(
> BrokenPipe -> Unconnected )
> Jan  5 03:23:51 vm5 kernel: [2223090.185587] drbd drbd0: receiver
> terminated
> 
> Eric
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 
> 
> --
> This message has been scanned for viruses and dangerous content by
> RMnet MailScanner, and is believed to be clean.
> 
> Click here to report this message as spam.
> http://efa1.rmnet.it/cgi-bin/learn-msg.cgi?id=4C1D868B16.A88D5&token=94b3a0f1dfd9db46184ad15228603c27
> 
> 



More information about the drbd-user mailing list