[DRBD-user] Congratulations and problems...

Felix Ide felix.ide-drbd at educators.de
Thu Feb 12 11:35:09 CET 2004


Hi list, hi Philipp R.,

first of all my congratulations to the article about your company in the 
german "magazine" Computerpartner ;-)

But we do have a problem here with DRBD, whenever I run bonnie++-tests on the 
DRBD-device I get error like these:

Feb  9 18:53:52 drbd1-bfd kernel: drbd0: [kupdated/7] sock_sendmsg returned 
-32
Feb  9 18:53:57 drbd1-bfd kernel: drbd0: Connection lost.
Feb  9 18:53:57 drbd1-bfd kernel: drbd0: Connection established. size=26097472 
KB / blksize=4096 B
Feb  9 18:53:57 drbd1-bfd kernel: drbd0: Synchronisation started blks=15
Feb  9 18:53:59 drbd1-bfd kernel: drbd0: [drbd_syncer_0/23587] sock_sendmsg 
returned -104
Feb  9 18:53:59 drbd1-bfd kernel: drbd0: Syncer send failed.
Feb  9 18:54:05 drbd1-bfd kernel: drbd0: Connection lost.
Feb  9 18:54:05 drbd1-bfd kernel: drbd0: Connection established. size=26097472 
KB / blksize=4096 B
Feb  9 18:54:05 drbd1-bfd kernel: drbd0: Synchronisation started blks=15
-----
Feb  9 21:57:45 drbd1-bfd kernel: drbd0: [bdflush/6] send timed out!!
Feb  9 21:57:45 drbd1-bfd kernel: drbd0: Syncer send failed.
Feb  9 21:57:50 drbd1-bfd kernel: drbd0: Connection lost.
Feb  9 21:57:50 drbd1-bfd kernel: drbd0: Connection established. size=26097472 
KB / blksize=4096 B
Feb  9 21:57:50 drbd1-bfd kernel: drbd0: Synchronisation started blks=15
-----
Feb 10 00:09:51 drbd1-bfd kernel: drbd0: [bdflush/6] sock_sendmsg returned -32
Feb 10 00:09:55 drbd1-bfd kernel: drbd0: Connection lost.
Feb 10 00:09:55 drbd1-bfd kernel: drbd0: Connection established. size=26097472 
KB / blksize=4096 B
Feb 10 00:09:55 drbd1-bfd kernel: drbd0: Synchronisation started blks=15
Feb 10 00:09:57 drbd1-bfd kernel: drbd0: [bdflush/6] send timed out!!
Feb 10 00:10:28 drbd1-bfd kernel: drbd0: Syncer aborted.
Feb 10 00:10:32 drbd1-bfd kernel: drbd0: Connection lost.
Feb 10 00:10:32 drbd1-bfd kernel: drbd0: Connection established. size=26097472 
KB / blksize=4096 B
Feb 10 00:10:32 drbd1-bfd kernel: drbd0: Synchronisation started blks=15
-----
Feb 11 10:54:14 drbd1-bfd kernel: drbd0: [kupdated/7] sock_sendmsg returned 
-32
Feb 11 10:54:20 drbd1-bfd kernel: drbd0: Connection lost.
Feb 11 10:54:20 drbd1-bfd kernel: drbd0: Connection established. size=26097472 
KB / blksize=4096 B
Feb 11 10:54:20 drbd1-bfd kernel: drbd0: Synchronisation started blks=15
Feb 11 10:54:22 drbd1-bfd kernel: drbd0: [bonnie++/32678] send timed out!!
Feb 11 10:54:53 drbd1-bfd kernel: drbd0: Syncer aborted.
Feb 11 10:54:57 drbd1-bfd kernel: drbd0: Connection lost.
Feb 11 10:54:57 drbd1-bfd kernel: drbd0: Connection established. size=26097472 
KB / blksize=4096 B
Feb 11 10:54:57 drbd1-bfd kernel: drbd0: Synchronisation started blks=15
Feb 11 10:59:01 drbd1-bfd /USR/SBIN/CRON[4858]: (root) CMD ( rm -f 
/var/spool/cron/lastrun/cron.hourly) 
Feb 11 11:11:03 drbd1-bfd -- MARK --
Feb 11 11:31:03 drbd1-bfd -- MARK --
Feb 11 11:48:13 drbd1-bfd kernel: drbd0: [bdflush/6] sock_sendmsg returned -32
Feb 11 11:48:13 drbd1-bfd kernel: drbd0: Syncer send failed.
Feb 11 11:48:20 drbd1-bfd kernel: drbd0: Connection lost.
Feb 11 11:48:20 drbd1-bfd kernel: drbd0: Connection established. size=26097472 
KB / blksize=4096 B
Feb 11 11:48:20 drbd1-bfd kernel: drbd0: Synchronisation started blks=15
-----
Feb 11 19:12:42 drbd1-bfd kernel: drbd0: [drbd_asender_0/5400] sock_sendmsg 
returned 0
Feb 11 19:12:46 drbd1-bfd kernel: drbd0: Connection lost.
Feb 11 19:12:46 drbd1-bfd kernel: drbd0: Connection established. size=26097472 
KB / blksize=4096 B
Feb 11 19:12:46 drbd1-bfd kernel: drbd0: Synchronisation started blks=15

The 2 systems are IBM eServer x345, Xeon 2,8GHz HT, Dual Intel GBit NIC, 
1 GB RAM, 2 x 36 GB U320 SCSI with SW-Raid 1 (onboard HW-Raid 1 with 
LSI1030 had performance problems), SuSE 8.2 with vanilla kernel 2.4.24,
DRBD 0.6.10+cvs and heartbeat 1.0.4

eth1 on both servers are dirctly connected and netio shows 117327 k bytes/sec, 
MTU is at 5000, so the NIC should not degredade performance...

We are running altogether 5 DRBD-Systems for our customers in different 
configurations, but these errors occur only on this one (It's the newest and 
fastest of these systems...)

Running bonnie++ without DRBD shows no problems, even when running 10 bonnies 
in parallel. But with DRBD underneath the error occurs sometimes even when 
running one bonnie process (but it is reproducable when running 3-4 bonnies 
at once).

drbd1-bfd:~ # drbdsetup /dev/nb0 show
Lower device: 09:01   (/dev/md1)
Disk options:
 do-panic
Local address: 10.20.30.1:7788
Remote address: 10.20.30.2:7788
Wire protocol: C
Net options:
 timeout = 19.0 sec
 ko-count = 20
 tl-size = 10000
 connect-int = 20 sec
 ping-int = 20 sec
 sndbuf-size = 131070
 sync-min = 500 KB/sec
 sync-max = 204800 KB/sec

drbd1-bfd:~ # cat /proc/drbd 
version: 0.6.10+cvs (api:64/proto:62)

0: cs:Connected st:Primary/Secondary ns:1339387608 nr:0 dw:1590662124 
dr:2003415012 pe:0 ua:0

Thank for your help,

Felix





More information about the drbd-user mailing list