[DRBD-user] DRBD serious locking due to TOE - UPDATE

Lars Ellenberg lars.ellenberg at linbit.com
Mon Jan 7 19:40:20 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Mon, Jan 07, 2008 at 03:16:04PM +0000, Ben Clewett wrote:
> 
> 
> Hi Lars,
> 
> Thanks for taking an interest, I hope you find something.  I will give 
> you all the information I have:
> 
> Kernel:	2.6.18.0
> DRBD:	8.2.1
> 
> 
> /proc/drbd before lock:
> 
> version: 8.2.1 (api:86/proto:86-87)
> GIT-hash: 318925802fc2638479ad090b73d7af45503dd184 build by 
> root at hp-tm-02, 2007-12-10 22:21:14
>  0: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate B r---
>     ns:1053120 nr:28862256 dw:28864144 dr:1065701 al:46 bm:392 lo:1 pe:0 ua:1 ap:0
>         resync: used:0/31 hits:65607 misses:217 starving:0 dirty:0 changed:217
>         act_log: used:1/257 hits:427 misses:46 starving:0 dirty:0 changed:46
>  1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate B r---
>     ns:41783720 nr:1053224 dw:42836948 dr:7104597 al:144980 bm:422 lo:0 pe:0 ua:0 ap:0
>         resync: used:0/31 hits:65595 misses:213 starving:0 dirty:0 changed:213
>         act_log: used:1/257 hits:10300951 misses:145315 starving:0 dirty:335 changed:144980
> 
> /proc/drbd during lock:
> 
> GIT-hash: 318925802fc2638479ad090b73d7af45503dd184 build by 
> root at hp-tm-02, 2007-12-10 22:21:14
>  0: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate B r---
>     ns:1053120 nr:78899808 dw:78901652 dr:1065701 al:46 bm:392 lo:12 pe:0 ua:1 ap:0
>         resync: used:0/31 hits:65607 misses:217 starving:0 dirty:0 changed:217
>         act_log: used:1/257 hits:427 misses:46 starving:0 dirty:0 changed:46
>  1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate B r---
>     ns:169235760 nr:1053224 dw:170288980 dr:26714233 al:655079 bm:422 lo:2 pe:0 ua:0 ap:2
>         resync: used:0/31 hits:65595 misses:213 starving:0 dirty:0 changed:213
>         act_log: used:3/257 hits:41653862 misses:656540 starving:0 dirty:1461 changed:655079
> 
> ** No kernel messages. **

try to avoid line breaks :)

and the other node, during "lock"?
(find information always on both nodes)

the "lo:NNN", on the Secondary, does that change?
I mean, does this gauge change,
and eventually decrease to zero, during "lock"?

do both drbd live on the same physical device?
(physical meaning the same io-queue in linux, i.e.
 the same /dev/sdX eventually,
 when propagating down all the lvm layers, if any)

how many cpus?
how many pdflush threads (ps ax | grep pdflush)
during "lock", are one or more of those in "D" state?
if so, does it stay in "D" state?

> It is unfortunate but I cannot take my servers off line for good testing.
> 
> The only figure I can consistent measure is the 'iowait' reported by 
> 'sar'.   When I have a lock this will report ~ 30%, and ~ 1% when not 
> locking.
> 
> The iowait is listed in the 'sar' man page as 'Percentage of time that 
> the CPU or CPUs were idle during which the system had an outstanding 
> disk I/O request.'  I guess this is a wait from the /dev/drbd0 device?
> 
> If you know how I could break down 'iowait' I would be interested.

during lock, do (on both nodes)
 ps -eo pid,state,wchan:40,comm | grep -Ee " D |drbd"
that should give some more information.

> During the lock the TCP IO drops considerably.  This is the output from 
> iftop before and during a lock:
> 
> Before:
> 192.168.95.5          => 192.168.95.6           107Mb  80.0Mb  76.2Mb
>                       <=                       4.98Mb  4.65Mb  4.61Mb
> 
> During:
> 192.168.95.5          => 192.168.95.6           214Kb  1.30Mb   963Kb
>                       <=                       8.74Mb  20.6Mb  10.9Mb

you can further break down that to port numbers.
(press capital D and S)
if you tcpdump limiting to the drbd port(s),
what is still transmitted?

during lock, does it help if you
 drbdadm disconnect $resource ; sleep 3; drbdadm adjust $resource
(on one or the other node)

how frequently do you run into these locks? 

during lock,
what does "netstat -tnp" say (always on both nodes)?
(preferably grep for the drbd connection,
 so something like
  netstat -tnp | grep ':778[89] '
 if your drbd ports are configured to be 7788 and 7789.

> I hope there is something useful in this.
> 
> Ben
> 
> PS, is there a way of working DRBD using UDP?  This would also fix this 
> specific problem with the Broadcom network cards, where the problem is 
> documented as specific to TCP.

No.
And don't expect that too soon.  for write ordering we would need to
reimplement the packet ordering logic of tcp for the udp connection.
why should we do that?

that said, maybe you could do openvpn over udp,
then direct drbd to use some ip within that openvpn net,
which then does a tcp connection to the tun device of openvpn,
which encrypts/compresses/authenticates/whatever that packet,
and will send it as udp over to the other node.  happy hacking :)

-- 
: Lars Ellenberg                           http://www.linbit.com :
: DRBD/HA support and consulting             sales at linbit.com :
: LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
: Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list