Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi list, hi Philipp R., first of all my congratulations to the article about your company in the german "magazine" Computerpartner ;-) But we do have a problem here with DRBD, whenever I run bonnie++-tests on the DRBD-device I get error like these: Feb 9 18:53:52 drbd1-bfd kernel: drbd0: [kupdated/7] sock_sendmsg returned -32 Feb 9 18:53:57 drbd1-bfd kernel: drbd0: Connection lost. Feb 9 18:53:57 drbd1-bfd kernel: drbd0: Connection established. size=26097472 KB / blksize=4096 B Feb 9 18:53:57 drbd1-bfd kernel: drbd0: Synchronisation started blks=15 Feb 9 18:53:59 drbd1-bfd kernel: drbd0: [drbd_syncer_0/23587] sock_sendmsg returned -104 Feb 9 18:53:59 drbd1-bfd kernel: drbd0: Syncer send failed. Feb 9 18:54:05 drbd1-bfd kernel: drbd0: Connection lost. Feb 9 18:54:05 drbd1-bfd kernel: drbd0: Connection established. size=26097472 KB / blksize=4096 B Feb 9 18:54:05 drbd1-bfd kernel: drbd0: Synchronisation started blks=15 ----- Feb 9 21:57:45 drbd1-bfd kernel: drbd0: [bdflush/6] send timed out!! Feb 9 21:57:45 drbd1-bfd kernel: drbd0: Syncer send failed. Feb 9 21:57:50 drbd1-bfd kernel: drbd0: Connection lost. Feb 9 21:57:50 drbd1-bfd kernel: drbd0: Connection established. size=26097472 KB / blksize=4096 B Feb 9 21:57:50 drbd1-bfd kernel: drbd0: Synchronisation started blks=15 ----- Feb 10 00:09:51 drbd1-bfd kernel: drbd0: [bdflush/6] sock_sendmsg returned -32 Feb 10 00:09:55 drbd1-bfd kernel: drbd0: Connection lost. Feb 10 00:09:55 drbd1-bfd kernel: drbd0: Connection established. size=26097472 KB / blksize=4096 B Feb 10 00:09:55 drbd1-bfd kernel: drbd0: Synchronisation started blks=15 Feb 10 00:09:57 drbd1-bfd kernel: drbd0: [bdflush/6] send timed out!! Feb 10 00:10:28 drbd1-bfd kernel: drbd0: Syncer aborted. Feb 10 00:10:32 drbd1-bfd kernel: drbd0: Connection lost. Feb 10 00:10:32 drbd1-bfd kernel: drbd0: Connection established. size=26097472 KB / blksize=4096 B Feb 10 00:10:32 drbd1-bfd kernel: drbd0: Synchronisation started blks=15 ----- Feb 11 10:54:14 drbd1-bfd kernel: drbd0: [kupdated/7] sock_sendmsg returned -32 Feb 11 10:54:20 drbd1-bfd kernel: drbd0: Connection lost. Feb 11 10:54:20 drbd1-bfd kernel: drbd0: Connection established. size=26097472 KB / blksize=4096 B Feb 11 10:54:20 drbd1-bfd kernel: drbd0: Synchronisation started blks=15 Feb 11 10:54:22 drbd1-bfd kernel: drbd0: [bonnie++/32678] send timed out!! Feb 11 10:54:53 drbd1-bfd kernel: drbd0: Syncer aborted. Feb 11 10:54:57 drbd1-bfd kernel: drbd0: Connection lost. Feb 11 10:54:57 drbd1-bfd kernel: drbd0: Connection established. size=26097472 KB / blksize=4096 B Feb 11 10:54:57 drbd1-bfd kernel: drbd0: Synchronisation started blks=15 Feb 11 10:59:01 drbd1-bfd /USR/SBIN/CRON[4858]: (root) CMD ( rm -f /var/spool/cron/lastrun/cron.hourly) Feb 11 11:11:03 drbd1-bfd -- MARK -- Feb 11 11:31:03 drbd1-bfd -- MARK -- Feb 11 11:48:13 drbd1-bfd kernel: drbd0: [bdflush/6] sock_sendmsg returned -32 Feb 11 11:48:13 drbd1-bfd kernel: drbd0: Syncer send failed. Feb 11 11:48:20 drbd1-bfd kernel: drbd0: Connection lost. Feb 11 11:48:20 drbd1-bfd kernel: drbd0: Connection established. size=26097472 KB / blksize=4096 B Feb 11 11:48:20 drbd1-bfd kernel: drbd0: Synchronisation started blks=15 ----- Feb 11 19:12:42 drbd1-bfd kernel: drbd0: [drbd_asender_0/5400] sock_sendmsg returned 0 Feb 11 19:12:46 drbd1-bfd kernel: drbd0: Connection lost. Feb 11 19:12:46 drbd1-bfd kernel: drbd0: Connection established. size=26097472 KB / blksize=4096 B Feb 11 19:12:46 drbd1-bfd kernel: drbd0: Synchronisation started blks=15 The 2 systems are IBM eServer x345, Xeon 2,8GHz HT, Dual Intel GBit NIC, 1 GB RAM, 2 x 36 GB U320 SCSI with SW-Raid 1 (onboard HW-Raid 1 with LSI1030 had performance problems), SuSE 8.2 with vanilla kernel 2.4.24, DRBD 0.6.10+cvs and heartbeat 1.0.4 eth1 on both servers are dirctly connected and netio shows 117327 k bytes/sec, MTU is at 5000, so the NIC should not degredade performance... We are running altogether 5 DRBD-Systems for our customers in different configurations, but these errors occur only on this one (It's the newest and fastest of these systems...) Running bonnie++ without DRBD shows no problems, even when running 10 bonnies in parallel. But with DRBD underneath the error occurs sometimes even when running one bonnie process (but it is reproducable when running 3-4 bonnies at once). drbd1-bfd:~ # drbdsetup /dev/nb0 show Lower device: 09:01 (/dev/md1) Disk options: do-panic Local address: 10.20.30.1:7788 Remote address: 10.20.30.2:7788 Wire protocol: C Net options: timeout = 19.0 sec ko-count = 20 tl-size = 10000 connect-int = 20 sec ping-int = 20 sec sndbuf-size = 131070 sync-min = 500 KB/sec sync-max = 204800 KB/sec drbd1-bfd:~ # cat /proc/drbd version: 0.6.10+cvs (api:64/proto:62) 0: cs:Connected st:Primary/Secondary ns:1339387608 nr:0 dw:1590662124 dr:2003415012 pe:0 ua:0 Thank for your help, Felix