[DRBD-user] sock_sendmsg and pending_cnt <0

Thu Feb 5 01:49:54 CET 2004

/ 2004-02-04 16:16:06 +0100
\ matthias zeichmann:
> hello!
> 
> yesterday i upgraded disks in our drbd cluster wich involved a full sync
> between the nodes (thanks again lars for providing support).
> 
> i used the opportunity to upgrade drbd to latest cvs and to switch to
> protocol C.
          ^^^ 

> Feb  3 20:49:29 atem kernel: drbd5: Connection established. size=96358
> KB / blksize=1024 B
                   ^^^
 ??

> during fullsync some logmessages showed up i want to share
> (please excuse the lengthy mail).

> now during fullsync and afterwards i got a lot [1] of messages like:
> ----------->8------------------------------------------------------

this:
> Feb  3 21:01:56 kaelte kernel: drbd2: [drbd_syncer_2/27957] sock_sendmsg
> returned -32
which is EPIPE: "Broken pipe" and
corresponds to this one:
> Feb  3 21:01:56 atem kernel: drbd2: unknown packet type!

So how did you get the "unknown packet type"?
TCP checksum is ok, DRBD_MAGIC is ok, and there still is
an unknown packet type?

How is this possible?

This does not feel good, but I doubt that drbd is at fault.
Better transfer huge amounts of data,
and verify MD5 SHA1 or whatever finger prints!!
Don't come running in three month time,
telling us "drbd" corrupts your data...
check for it NOW.

> Feb  3 21:02:31 kaelte kernel: drbd4: [drbd_syncer_4/27961] sock_sendmsg
> time expired, ko = 4294967295

No problem, it gets retried until ko hits zero ...
ko decrements with every ping intervall,
but is reset with each transfered block.

> Feb  3 21:18:50 kaelte kernel: drbd2: [drbd_syncer_2/2439] send timed
> out!!

again:
> Feb  3 21:18:50 atem kernel: drbd2: unknown packet type!

> i also stumbled across
> http://thread.gmane.org/gmane.comp.linux.drbd/5571
> and i understand i have to tune the net sections [2] with my drbd
> devices. maybe sync-nice=0 is too harsh? i'm just not sure how, cos i am
> not really satisfied with the sync speed i get.

For the range of the sync-nice value, see "man nice".
Normal interactive processes run with priority 0.

> i use intel gigabit nics on 2.4.24 with rx polling support (mtu 1500)
> for drbd.

increase mtu to 5000, if you like.

> the two nodes are connected via crossover cable and sync speed
> maxes out at approx. 13MB/s while i know that the disks do better [3]
> and i know that the network interfaces can do better.

probably network latency. try sndbuf-size = 65534 
but if that breaks something, don't bug me ;)

> another note on synching: during fullsync pe was >> 0 on the sending
> side and ua >> 0 on the receiving side; is this normal? after sync both
> coloumns went back to zero.

absolutely.
"pending" and "unacknowledged" just means there is data on the fly.
if they are high, it is much data ;)

> Feb  4 03:00:05 kaelte kernel: drbd2: pending_cnt <0 !!!
> 
> didn't see that for quite some time; last time i saw it it came along
> with a deadlock on primary :-/

hm...
?

> [3] disk performance local vs. drbd
> all filesystems are ext3 in ordered data mode, additionally drbd devices
> are mounted with "noatime"
> disks are two scsi uw 10k rpm 73GB in a logical RAID 1 array on each
> node with HP netraid 1M hardware raid controller
> ----------->8--------------------------------------------------------
> kaelte:root# # local filesystem
400 MB in 17 seconds

> # drbd fs
400 MB in 41 seconds

> [4] net throughput
> sent:       805M,   34737.8K/s total,   40908.3K/s current
> recv'd:     800M,   34484.7K/s total,   40712.1K/s current
obviously not your bottleneck.

check the latency of differently sized packets, though.
anyone wants to recommend a benchmarking tool for this purpose?
ping?  bing?

	Lars Ellenberg