[DRBD-user] DRBD serious locking due to TOE

Ben Clewett ben at roadrunner.uk.com
Fri Dec 14 15:49:00 CET 2007



Florian,

Thanks for the replay and the information.  I have never used this tool 
before, looks like some interesting options.

# ethtool --show-offload eth3

Offload parameters for eth3:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off

# ethtool -K eth3 tx off rx off
# ethtool --show-offload eth3

Offload parameters for eth3:
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off

As luck would have it I am experiencing a DRBD lock at the moment, I 
will watch and see that happens.

I ran this about 10 minutes ago.  So far this has had no effect.  But 
this could be because I am in a lock as I type.  It will be interesting 
to see if another lock occurs later.

Just for reference, and if it's any help, this is what drbd looks like 
in and not in a lock.  Note the vast increase in unaccounted load, and 
vast decrees in transmit rate during the lock.



NO LOCK
=======

# top
top - 09:26:03 up 1 day, 10:37,  1 user,  load average: 1.18, 1.13, 0.84
Tasks: 122 total,   2 running, 120 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.4%us,  4.4%sy,  0.0%ni, 78.0%id, 11.1%wa,  0.1%hi,  3.0%si,  0.0%
Mem:  10235944k total, 10176028k used,    59916k free,   140952k buffers
Swap: 10490436k total,       56k used, 10490380k free,  5786628k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  4634 mysql     15   0 4167m 3.6g 5632 S   22 36.5  66:28.03 mysqld
19964 root      -3   0     0    0    0 R    4  0.0   2:15.44 drbd1_asender
  3753 root      16   0     0    0    0 S    3  0.0   1:22.36 drbd1_worker
  3789 root      15   0     0    0    0 S    2  0.0   1:37.00 drbd1_receiver

# cat /proc/drbd
version: 8.2.1 (api:86/proto:86-87)
GIT-hash: 318925802fc2638479ad090b73d7af45503dd184 build by 
root at hp-tm-02, 2007-12-10 22:21:14
  0: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate B r---
     ns:1053120 nr:78899808 dw:78901652 dr:1065701 al:46 bm:392 lo:12 
pe:0 ua:1 ap:0
         resync: used:0/31 hits:65607 misses:217 starving:0 dirty:0 
changed:217
         act_log: used:1/257 hits:427 misses:46 starving:0 dirty:0 
changed:46
  1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate B r---
     ns:169235760 nr:1053224 dw:170288980 dr:26714233 al:655079 bm:422 
lo:2 pe:0 ua:0 ap:2
         resync: used:0/31 hits:65595 misses:213 starving:0 dirty:0 
changed:213
         act_log: used:3/257 hits:41653862 misses:656540 starving:0 
dirty:1461 changed:655079

# iftop -i eth3  (eth3 Rate)
192.168.95.5  => 192.168.95.6      107Mb  80.0Mb  76.2Mb
               <=                  4.98Mb  4.65Mb  4.61Mb



LOCK
====

# top
top - 14:42:03 up 3 days, 15:53,  1 user,  load average: 2.04, 2.10, 2.07
Tasks: 122 total,   1 running, 121 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.9%us,  0.5%sy,  0.0%ni, 72.6%id, 24.6%wa,  0.0%hi,  0.3%si, 
0.0%st
Mem:  10235944k total, 10176576k used,    59368k free,   145100k buffers
Swap: 10490436k total,       56k used, 10490380k free,  5719180k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  4634 mysql     15   0 4180m 3.6g 5652 S    6 37.2 233:53.06 mysqld
  3781 root      15   0     0    0    0 D    1  0.0   4:33.62 drbd0_receiver


# cat /proc/drbd
version: 8.2.1 (api:86/proto:86-87)
GIT-hash: 318925802fc2638479ad090b73d7af45503dd184 build by 
root at hp-tm-02, 2007-12-10 22:21:14
  0: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate B r---
     ns:1053120 nr:78899808 dw:78901652 dr:1065701 al:46 bm:392 lo:12 
pe:0 ua:1 ap:0
         resync: used:0/31 hits:65607 misses:217 starving:0 dirty:0 
changed:217
         act_log: used:1/257 hits:427 misses:46 starving:0 dirty:0 
changed:46
  1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate B r---
     ns:169235760 nr:1053224 dw:170288980 dr:26714233 al:655079 bm:422 
lo:2 pe:0 ua:0 ap:2
         resync: used:0/31 hits:65595 misses:213 starving:0 dirty:0 
changed:213
         act_log: used:3/257 hits:41653862 misses:656540 starving:0 
dirty:1461 changed:655079


# iftop -i eth3
hp-tm-02.road-runner  => 192.168.95.6           214Kb  1.30Mb   963Kb
                       <=                       8.74Mb  20.6Mb  10.9Mb


Regards,

Ben Clewett.



Florian Haas wrote:
 > ethtool -K tx off eth<num>
 > ethtool -K rx off eth<num>
 >
 > This is based on the assumption that your issue is not one with TOE (TCP
 > Offload Engine, a term that I've only seen applied to iSCSI HBAs and 
10GbE
 > cards thus far), but with TCP checksum offloading (a feature present on
 > virtually all contemporary Ethernet NICs).
 >
 > And, this is supported by the bnx2 driver.
 >
 > Florian
 >
 > On Friday 14 December 2007 11:17:34 Ben Clewett wrote:
 >> Dear DRBD,
 >>
 >> I have a repeatable problem with DRBD 8.2.1 where it locks up, and the
 >> replication ability falls by several orders of magnitude.  This is the
 >> same as the problem reported by Ben Lavender on 2007-08-29.
 >>
 >> Ben identified the problem as due to the TOE protocol on his DELL
 >> network card.  Our HP network cards (NetXtreme II BCM5708 1000Base-SX)
 >> use the same Broadcom chipset, but unlike the DELL card, the HP card
 >> provides no mechanism to disable TOE.  Or at least no published
 >> mechanism in the BIOS or available to Linux, and no jumpers on the PCB.
 >>
 >> The problem occurs under heavy loading.  The NIC's ability to handle TCP
 >> packets falls to about a tenth of it's normal rate, which is normally
 >> 100MB/sec on our set-up.  Therefore rendering DRBD and our MySql
 >> database unusable for a few minutes.
 >>
 >> I would like to ask if there is anything that can be done in DRBD to get
 >> round this problem, like for instance using UDP instead of TCP, or some
 >> bug-fix for TOE which any member may know about?
 >>
 >> If this is not the case we will have to replace our NIC's, which is
 >> really not something we want to do, since all HP NIC's for HP servers
 >> seem to have the same chipset.
 >>
 >> Any advise would be extremely welcome!
 >>
 >> Regards,
 >>
 >> Ben Clewett.
 >


*************************************************************************
This e-mail is confidential and may be legally privileged. It is intended
solely for the use of the individual(s) to whom it is addressed. Any
content in this message is not necessarily a view or statement from Road
Tech Computer Systems Limited but is that of the individual sender. If
you are not the intended recipient, be advised that you have received
this e-mail in error and that any use, dissemination, forwarding,
printing, or copying of this e-mail is strictly prohibited. We use
reasonable endeavours to virus scan all e-mails leaving the company but
no warranty is given that this e-mail and any attachments are virus free.
You should undertake your own virus checking. The right to monitor e-mail
communications through our networks is reserved by us

  Road Tech Computer Systems Ltd. Shenley Hall, Rectory Lane, Shenley,
  Radlett, Hertfordshire, WD7 9AN. - VAT Registration No GB 449 3582 17
  Registered in England No: 02017435, Registered Address: Charter Court, 
  Midland Road, Hemel Hempstead,  Hertfordshire, HP2 5GE. 
*************************************************************************



More information about the drbd-user mailing list