[DRBD-user] sock_sendmsg and pending_cnt <0

matthias zeichmann matthias at volltext.net
Wed Feb 4 16:16:06 CET 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


hello!

yesterday i upgraded disks in our drbd cluster wich involved a full sync
between the nodes (thanks again lars for providing support).

i used the opportunity to upgrade drbd to latest cvs and to switch to
protocol C. during fullsync some logmessages showed up i want to share
(please excuse the lengthy mail).

kaelte was primary on all devices and holding all services of the
cluster. timeline of what i did on kaelte [0]

now during fullsync and afterwards i got a lot [1] of messages like:
----------->8------------------------------------------------------
Feb  3 21:01:56 kaelte kernel: drbd2: [drbd_syncer_2/27957] sock_sendmsg
returned -32
Feb  3 21:01:56 kaelte kernel: drbd2: Syncer send failed.
Feb  3 21:01:56 kaelte kernel: drbd2: Connection lost.
Feb  3 21:01:56 kaelte kernel: drbd2: Connection established.
size=19535008 KB / blksize=4096 B
Feb  3 21:01:56 kaelte kernel: drbd2: Synchronisation started blks=15
Feb  3 21:02:31 kaelte kernel: drbd4: [drbd_syncer_4/27961] sock_sendmsg
time expired, ko = 4294967295
Feb  3 21:02:36 kaelte kernel: drbd0: [drbd_syncer_0/27953] sock_sendmsg
time expired, ko = 4294967295
Feb  3 21:02:47 kaelte kernel: drbd4: [drbd_syncer_4/27961] sock_sendmsg
time expired, ko = 4294967295
Feb  3 21:03:06 kaelte kernel: drbd2: [drbd_syncer_2/30738] sock_sendmsg
time expired, ko = 4294967295
Feb  3 21:03:23 kaelte kernel: drbd4: Synchronisation done.
Feb  3 21:08:10 kaelte kernel: drbd0: [drbd_syncer_0/27953] sock_sendmsg
returned -32
Feb  3 21:08:10 kaelte kernel: drbd0: Syncer send failed.
Feb  3 21:08:10 kaelte kernel: drbd0: Connection lost.
Feb  3 21:08:11 kaelte kernel: drbd0: Connection established.
size=9767488 KB / blksize=4096 B
Feb  3 21:08:11 kaelte kernel: drbd0: Synchronisation started blks=15
Feb  3 21:09:20 kaelte kernel: drbd2: [drbd_syncer_2/30738] sock_sendmsg
time expired, ko = 4294967295
Feb  3 21:17:14 kaelte kernel: drbd2: [drbd_syncer_2/30738] sock_sendmsg
returned -32
Feb  3 21:17:14 kaelte kernel: drbd2: Syncer send failed.
Feb  3 21:17:14 kaelte kernel: drbd2: Connection lost.
Feb  3 21:17:14 kaelte kernel: drbd2: Connection established.
size=19535008 KB / blksize=4096 B
Feb  3 21:17:14 kaelte kernel: drbd2: Synchronisation started blks=15
Feb  3 21:18:50 kaelte kernel: drbd2: [drbd_syncer_2/2439] send timed
out!!
Feb  3 21:18:50 kaelte kernel: drbd2: Syncer send failed.
Feb  3 21:18:50 kaelte kernel: drbd2: Connection lost.
Feb  3 21:18:50 kaelte kernel: drbd2: Connection established.
size=19535008 KB / blksize=4096 B
----------->8--------------------------------------------------------
Feb  3 20:49:27 atem kernel: drbd5: Creating state file
Feb  3 20:49:27 atem kernel: "/var/lib/drbd/drbd5"
Feb  3 20:49:29 atem kernel: drbd5: Connection established. size=96358
KB / blksize=1024 B
Feb  3 21:01:56 atem kernel: drbd2: unknown packet type!
Feb  3 21:01:56 atem kernel: drbd2: Connection lost.
Feb  3 21:01:56 atem kernel: drbd2: Connection established.
size=19535008 KB / blksize=4096 B
Feb  3 21:08:10 atem kernel: drbd0: unknown packet type!
Feb  3 21:08:10 atem kernel: drbd0: Connection lost.
Feb  3 21:08:11 atem kernel: drbd0: Connection established. size=9767488
KB / blksize=4096 B
Feb  3 21:17:13 atem kernel: drbd2: unknown packet type!
Feb  3 21:17:14 atem kernel: drbd2: Connection lost.
Feb  3 21:17:14 atem kernel: drbd2: Connection established.
size=19535008 KB / blksize=4096 B
Feb  3 21:18:50 atem kernel: drbd2: unknown packet type!
Feb  3 21:18:50 atem kernel: drbd2: Connection lost.
Feb  3 21:18:50 atem kernel: drbd2: Connection established.
size=19535008 KB / blksize=4096 B
----------->8--------------------------------------------------------

i read the other thread about "sock_sendmsg time expired, ko = xxxxx"
but i didnt run into problems of lockups on neither primary nor
secondary like ward did. 

i also stumbled across
http://thread.gmane.org/gmane.comp.linux.drbd/5571
and i understand i have to tune the net sections [2] with my drbd
devices. maybe sync-nice=0 is too harsh? i'm just not sure how, cos i am
not really satisfied with the sync speed i get.

i use intel gigabit nics on 2.4.24 with rx polling support (mtu 1500)
for drbd. the two nodes are connected via crossover cable and sync speed
maxes out at approx. 13MB/s while i know that the disks do better [3]
and i know that the network interfaces can do better.

i tested net troughput with netstrain and got nice figures [4]

another note on synching: during fullsync pe was >> 0 on the sending
side and ua >> 0 on the receiving side; is this normal? after sync both
coloumns went back to zero.

another oddity that showed up in the logs was:
----------->8--------------------------------------------------------
Feb  4 03:00:05 kaelte kernel: drbd2: pending_cnt <0 !!!
----------->8--------------------------------------------------------

didn't see that for quite some time; last time i saw it it came along
with a deadlock on primary :-/

regards m

[0] drbd disk-size upgrade
----------->8--------------------------------------------------------
# drop all connections to drbd services
iptables -A INPUT -i eth0 -p tcp --destination-port ! 22 -j DROP

cd /etc/init.d
for i in postfix spamd courier-* httpd mysqld proftpd slapd
do
  ./$i stop
done

ps ax # verify that services are all stopped

for i in $(seq 0 5)
do
  umount /dev/nb$i
done

ssh atem "drbd stop"

cp /etc/drbd.conf.new /etc/drbd.conf

/etc/init.d/drbd restart

for i in 0 2 3
do
  e2fsck -f /dev/nb$i 
  resize2fs /dev/nb$i
done

mount /dev/nb0
df               # verify disksize
umount /dev/nb0

# disksize ok? allright then:
ssh atem "cp /etc/drbd.conf.new /etc/drbd.conf
          rm /var/lib/drbd/*
          /etc/init.d/drbd start"
----------->8--------------------------------------------------------

[1]
----------->8--------------------------------------------------------
kaelte:root# grep -c "sock_sendmsg time expired" \
                     /var/log/syslog.0 /var/log/syslog   
/var/log/syslog.0:374
/var/log/syslog:30
kaelte:root# 
----------->8--------------------------------------------------------

[2] drbd.conf excerpt
----------->8--------------------------------------------------------
  net {
    sync-rate=60000
    sync-nice=0
    tl-size=5000
    timeout=10
    connect-int=10
    ping-int=10
  }
----------->8--------------------------------------------------------

[3] disk performance local vs. drbd
all filesystems are ext3 in ordered data mode, additionally drbd devices
are mounted with "noatime"
disks are two scsi uw 10k rpm 73GB in a logical RAID 1 array on each
node with HP netraid 1M hardware raid controller
----------->8--------------------------------------------------------
kaelte:root# # local filesystem
kaelte:root# sync; \
    time dd if=/dev/zero of=/opt/400M bs=4k count=100000; \
    time sync
100000+0 Records ein
100000+0 Records aus

real    0m4.391s
user    0m0.100s
sys     0m3.750s

real    0m12.580s
user    0m0.000s
sys     0m0.170s
kaelte:root# 

# drbd fs
kaelte:root# sync; \
  time dd if=/dev/zero of=/var/spool/courier/400M bs=4k count=100000; \
  time sync
100000+0 Records ein
100000+0 Records aus

real    0m4.470s
user    0m0.060s
sys     0m4.050s

real    0m36.050s
user    0m0.000s
sys     0m2.520s
kaelte:root#
----------->8--------------------------------------------------------

[4] net throughput
----------->8--------------------------------------------------------
atem:bin# netstrain kaelte_priv 3333 send
NetStrain 3.0  (c) 2002 Christoph Pfisterer <cp at chrisp.de>
Looking up hostname kaelte_priv...
Connecting to 192.168.0.3 port 3333 using IPv4...
Connected
sent:      2796M,   97707.8K/s total,  109141.1K/s current
recv'd:       0B,         0B/s total,         0B/s current

atem:bin# # wow!
atem:bin# netstrain kaelte_priv 3333 both
NetStrain 3.0  (c) 2002 Christoph Pfisterer <cp at chrisp.de>
Looking up hostname kaelte_priv...
Connecting to 192.168.0.3 port 3333 using IPv4...
Connected
sent:       805M,   34737.8K/s total,   40908.3K/s current
recv'd:     800M,   34484.7K/s total,   40712.1K/s current

atem:bin# # not bad...
----------->8--------------------------------------------------------
-- 
vi vi vi    - editor of the beast 
vim vim vim - editor of the *ALL* *NEW* improved beast




More information about the drbd-user mailing list