[DRBD-user] DRBD serious locking due to TOE - UPDATE

Tue Jan 8 12:21:21 CET 2008

On Tue, Jan 08, 2008 at 09:06:44AM +0000, Ben Clewett wrote:
> 
> 
> Dear Lars,
> 
> I found a server at lock when I got to my desk this morning.  Not wanting to
> waist any time, these are the numbers you ask for.
> 
> Lock on 'hp-tm-02', twin with 'hp-tm-04' which is partially locked.
> 
> I use the term 'lock' to explain a server with high load and very much
> reduced throughput.

as long as there is still throughput,
it is not likely not a problem in drbd.
but see below.

> hp-tm-02:  (lock)
> 
> version: 8.2.1 (api:86/proto:86-87)
> GIT-hash: 318925802fc2638479ad090b73d7af45503dd184 build by root at hp-tm-02, 
> 2007-12-19 22:25:46
>  0: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate B r---
>     ns:0 nr:91896348 dw:91896340 dr:0 al:0 bm:0 lo:2 pe:0 ua:1 ap:0
                                                     ^         ^

two requests pending against local disk,
one answer still to be sent to the peer
(which will happen once the local requests complete).

on a Secondary,
if ua stays != zero, and ns,nr,dw,dr do not increase during that time,
drbd has a problem. if those ns,nr,dw,dr still increase, or ua is zero,
all is fine.

>  1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate B r---
>     ns:125994544 nr:0 dw:125994540 dr:57151581 al:477198 bm:0 lo:2 pe:0 
>     ua:0 ap:2

on a Primary,
if ap or pe stays != zero, and the nfs,nrw,dw,dr do not increase,
drbd has a problem, if those ns,nrw,dw,dr do still increase,
or pe is zero, all is fine.

> > the "lo:NNN", on the Secondary, does that change?
> > I mean, does this gauge change,
> > and eventually decrease to zero, during "lock"?
> 
> When found was at zero.

see above.

> > do both drbd live on the same physical device?
> > (physical meaning the same io-queue in linux, i.e.
> > the same /dev/sdX eventually,
> > when propagating down all the lvm layers, if any)
> 
> Both DRBD resource partitions and both DRBD bitmaps live on the same devise:
> /dev/cciss, split into four partitions.  This is a hardware RAID 5
> devise from seven + one physical SAS disks.  This has 256MB
> write-ahead-cache (with battery) and a tested write rate (bonnie++) of
> about 250MB/sec.
> 
> I do not use the LVM system, if you mean the IBM piece of dynamic
> partitioning software ported onto Linux.
> 
> > how many cpus?
> 
> Four physical = eight logical on twin duel Zeon core.
> 
> > how many pdflush threads (ps ax | grep pdflush)
> > during "lock", are one or more of those in "D" state?
> > if so, does it stay in "D" state?
> 
> On hp-tm-02 (locked)
> 
> # ps axl | grep pdflush
>  1  0  196  15  15  0  0  0  pdflus  S  ?  0:15  [pdflush]
>  1  0  197  15  15  0  0  0  sync_b  D  ?  1:34  [pdflush]
> 
> Seems to move between S and D for the 197 pid, mostly S.

as long as it is mostly S, thats good.

> > during lock, do (on both nodes)
> >  ps -eo pid,state,wchan:40,comm | grep -Ee " D |drbd"
> > that should give some more information.
> 
> hp-tm-02:  (locked)
> 
>  3788 S -                                        drbd0_worker
>  3796 D drbd_md_sync_page_io                     drbd1_worker
>  3817 S -                                        drbd0_receiver
>  3825 S -                                        drbd1_receiver
>  4394 S -                                        drbd0_asender
>  4395 S -                                        drbd1_asender
>  2996 D sync_buffer                              find
> 
> And:
> 
>   197 D sync_buffer                              pdflush
>   959 D -                                        reiserfs/3
>  3788 S -                                        drbd0_worker
>  3796 S -                                        drbd1_worker
>  3817 D drbd_wait_ee_list_empty                  drbd0_receiver
>  3825 S -                                        drbd1_receiver
>  4394 S -                                        drbd0_asender
>  4395 S -                                        drbd1_asender
> 
> But mostly:
> 
>  3788 S -                                        drbd0_worker
>  3796 S -                                        drbd1_worker
>  3817 S -                                        drbd0_receiver
>  3825 S -                                        drbd1_receiver
>  4394 S -                                        drbd0_asender
>  4395 S -                                        drbd1_asender

that is fine, no indication of misbehaviour.

> hp-tm-04:  (partially locked)
> 
> 14188 S -                                        drbd0_worker
> 14194 S -                                        drbd1_worker
> 14214 S -                                        drbd0_receiver
> 14216 S -                                        drbd0_asender
> 14223 S -                                        drbd1_receiver
> 14225 S -                                        drbd1_asender

just fine.

> > during lock, does it help if you
> > drbdadm disconnect $resource ; sleep 3; drbdadm adjust $resource
> > (on one or the other node)
> 
> I am sorry I can't disconnect these resources.
> 
> > how frequently do you run into these locks?
> 
> Depending on loading.  I don't have much quantitative data.  It seems
> to hit after about 4 days runtime, at least the last few times.  Once
> the locking has started it will continue until some time after loading,
> say 30 minutes.  But once hit, it will return frequently at lower load.
> Will continue on and off until (i) restart DRBD (ii) restart server.
> I am not sure at this stage which.
> 
> 
> > during lock,
> > what does "netstat -tnp" say (always on both nodes)?
> > (preferably grep for the drbd connection,
> >  so something like
> >  netstat -tnp | grep ':778[89] '
> >  if your drbd ports are configured to be 7788 and 7789.
> 
> hp-tm-02:
> 
>  tcp  0  0  192.168.95.5:7788   192.168.95.6:45579  ESTABLISHED  -
>  tcp  0  0  192.168.95.5:7789   192.168.95.6:50365  ESTABLISHED  -
>  tcp  0  0  192.168.95.5:51501  192.168.95.6:7789   ESTABLISHED  -
>  tcp  0  0  192.168.95.5:54029  192.168.95.6:7788   ESTABLISHED  -
>
>  hp-tm-04:
>
>  tcp  0  0  192.168.95.6:7788   192.168.95.5:54029  ESTABLISHED  -
>  tcp  0  0  192.168.95.6:7789   192.168.95.5:51501  ESTABLISHED  -
>  tcp  0  0  192.168.95.6:50365  192.168.95.5:7789   ESTABLISHED  -
>  tcp  0  0  192.168.95.6:45579  192.168.95.5:7788   ESTABLISHED  -
>  tcp  0  0  192.168.95.6:45579  192.168.95.5:7788   ESTABLISHED  -

last line is duplicate.  bug in netstat, probably.  

all as it should be, no queuing in the tcp buffers.

> Now that's odd, why should there be five?   But repeat of test shows
> just four entries.

drbd apears to be just healthy and happy.

an other thought, what file system, and mount options?

-- 
: Lars Ellenberg                           http://www.linbit.com :
: DRBD/HA support and consulting             sales at linbit.com :
: LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
: Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
__
please use the "List-Reply" function of your email client.