[DRBD-user] Possible IPoIB deadlock with DRBD

Matteo Tescione matteo at RMnet.it
Fri Jan 16 16:16:01 CET 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Yes I've seen the posts you suggested. 
Do you have it tested? 
we are using qib driver for qlogic 7342 adapters, don't know if someone ported to it too.

Regards,

--
matteo

----- Messaggio originale -----
> Da: "Eric Blevins" <ericlb100 at gmail.com>
> A: "Matteo Tescione" <matteo at rmnet.it>
> Cc: drbd-user at lists.linbit.com
> Inviato: Venerdì, 16 gennaio 2015 15:50:17
> Oggetto: Re: [DRBD-user] Possible IPoIB deadlock with DRBD
> 
> The split brain would only happen on dual primary.
> 
> We have Mellanox MHEA28-XTC using mthca driver.
> 
> The potential IPoIB deadlock is only fixed in the mlx4 driver so far.
> 
> 
> 
> common {
>   net {
>     connect-int 20; #Default 10 units 1
>     timeout 180; #default 60 units .1
>     ping-int 30; #default 10 units 1
>     ping-timeout 10; #default 5 units .1
>     ko-count 20;
>     max-buffers 16000;
>     max-epoch-size 16000;
>     sndbuf-size 0;
>     rcvbuf-size 0;
>     unplug-watermark 16001;
>     verify-alg md5;
>   }
>   disk {
>     c-plan-ahead 10;
>     c-min-rate 30M;
>     c-max-rate 200M;
>     c-fill-target 20M;
>     al-extents 3389;
>     md-flushes no;
>     disk-barrier no;
>     disk-flushes no;
>   }
> }
> resource drbd0 {
>   device /dev/drbd0;
>   disk /dev/sdc;
>   meta-disk internal;
>   startup {
>     wfc-timeout  120;
>     degr-wfc-timeout 60;
>     outdated-wfc-timeout 60;
>     become-primary-on both;
>   }
>   disk {
>     c-max-rate 200M;
>     c-min-rate 30M;
>     c-fill-target 20M;
>     c-plan-ahead 10;
> 
>   }
>   net {
>     protocol C;
>     cram-hmac-alg sha1;
>     shared-secret "XXXXXXXXXXXXX";
>     allow-two-primaries;
>     after-sb-0pri discard-zero-changes;
>     after-sb-1pri discard-secondary;
>     after-sb-2pri disconnect;
>   }
>   on vm1 {
>     address x.x.x.1:7788;
>   }
>   on vm2 {
>     address x.x.x.2:7788;
>   }
> }
> 
> On Fri, Jan 16, 2015 at 5:32 AM, Matteo Tescione <matteo at rmnet.it>
> wrote:
> > Hi Eric,
> >
> > it seems that I'm hitting the same deadlock, but I don't use dual
> > primary, and the split brain never occurs.
> >
> > Can you post your drbd config as long with the infiniband hba model
> > and version you're using?
> >
> > regards,
> >
> > --
> > matteo
> >
> > ----- Messaggio originale -----
> >> Da: "Eric Blevins" <ericlb100 at gmail.com>
> >> A: drbd-user at lists.linbit.com
> >> Inviato: Giovedì, 15 gennaio 2015 17:53:48
> >> Oggetto: [DRBD-user] Possible IPoIB deadlock with DRBD
> >>
> >> We are using Proxmox with DRBD in dual primary using IPoIB for
> >> transport
> >> Recently tested Proxmox upcoming 3.10 kernel based on the kernel
> >> from
> >> RHEL 7 and started having problems with DRBD.
> >>
> >> The kernel came with DRBD 8.4.3, I have also compiled and
> >> installed
> >> 8.4.5 and both experience the same problem.
> >>
> >> During times of heavy IO loads (backups) DRBD will timeout and
> >> split
> >> brain, I have included some logs below.
> >> I stumbled on a couple LKML threads that discusses a deadlock with
> >> IPoIB and IO that happens over the IPoIB such as iSCSI or NFS.
> >> https://lkml.org/lkml/2014/2/21/655
> >> http://lkml.org/lkml/2014/4/24/543
> >>
> >> Is it likely that DRBD could also trigger the deadlock discussed
> >> on
> >> LKML?
> >> If not, do you have any other suggestions on how I can prevent
> >> this
> >> timeout?
> >>
> >>
> >> Node A:
> >> Jan  5 03:23:51 vm6 kernel: [2221944.335766] drbd drbd0: peer(
> >> Primary
> >> -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate ->
> >> DUnknown
> >> )
> >> Jan  5 03:23:51 vm6 kernel: [2221944.335782] drbd drbd0: asender
> >> terminated
> >> Jan  5 03:23:51 vm6 kernel: [2221944.335784] drbd drbd0:
> >> Terminating
> >> drbd_a_drbd0
> >> Jan  5 03:23:51 vm6 kernel: [2221944.335846] block drbd0: new
> >> current
> >> UUID
> >> BD9DB97EC672F5C9:8F2DD469C771058B:925C07CF6316212D:925B07CF6316212D
> >> Jan  5 03:23:51 vm6 kernel: [2221944.347788] drbd drbd0:
> >> Connection
> >> closed
> >> Jan  5 03:23:51 vm6 kernel: [2221944.347834] drbd drbd0: conn(
> >> Timeout
> >> -> Unconnected )
> >> Jan  5 03:23:51 vm6 kernel: [2221944.347836] drbd drbd0: receiver
> >> terminated
> >>
> >>
> >> Node B:
> >> Jan  5 03:23:51 vm5 kernel: [2223090.170391] drbd drbd0: sock was
> >> shut
> >> down by peer
> >> Jan  5 03:23:51 vm5 kernel: [2223090.170409] drbd drbd0: peer(
> >> Primary
> >> -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate ->
> >> DUnknown )
> >> Jan  5 03:23:51 vm5 kernel: [2223090.170412] drbd drbd0: short
> >> read
> >> (expected size 16)
> >> Jan  5 03:23:51 vm5 kernel: [2223090.170421] drbd drbd0: asender
> >> terminated
> >> Jan  5 03:23:51 vm5 kernel: [2223090.170423] drbd drbd0:
> >> Terminating
> >> drbd_a_drbd0
> >> Jan  5 03:23:51 vm5 kernel: [2223090.170480] block drbd0: new
> >> current
> >> UUID
> >> 2628F73F9DAE5EDF:8F2DD469C771058B:925C07CF6316212D:925B07CF6316212D
> >> Jan  5 03:23:51 vm5 kernel: [2223090.185536] drbd drbd0:
> >> Connection
> >> closed
> >> Jan  5 03:23:51 vm5 kernel: [2223090.185585] drbd drbd0: conn(
> >> BrokenPipe -> Unconnected )
> >> Jan  5 03:23:51 vm5 kernel: [2223090.185587] drbd drbd0: receiver
> >> terminated
> >>
> >> Eric
> >> _______________________________________________
> >> drbd-user mailing list
> >> drbd-user at lists.linbit.com
> >> http://lists.linbit.com/mailman/listinfo/drbd-user
> >>
> >>
> >> --
> >> This message has been scanned for viruses and dangerous content by
> >> RMnet MailScanner, and is believed to be clean.
> >>
> >> Click here to report this message as spam.
> >> http://efa1.rmnet.it/cgi-bin/learn-msg.cgi?id=4C1D868B16.A88D5&token=94b3a0f1dfd9db46184ad15228603c27
> >>
> >>
> 
> 
> --
> This message has been scanned for viruses and dangerous content by
> E.F.A. Project, and is believed to be clean.
> 
> Click here to report this message as spam.
> http://efa2.rmnet.it/cgi-bin/learn-msg.cgi?id=56D2360055.A4FCF&token=6f1b16f22f5f99bcc8213a40ef7ce29d
> 
> 



More information about the drbd-user mailing list