Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, 2004-03-23 at 12:28, Lars Ellenberg wrote: > > Please could someone knowledgeble tell me why this may happen? > > Under load (local massive quota update by a script plus cpio of a big > > tree from an NFS client) synchronization is repeatedly restarted after > > ~1% completion: [...] > 0.6.12 should help for some weird behaviour related to the above problem. > Whether it fixes the actual problem I cannot say. Please report back. Behavior did not change with 0.6.12. When secondary is booted, it synchronizes, works well for some time, and then starts to disconnect and reconnect every few minutes. At this time, load average on the master grows two or three times higher than it was while "normal" operation. I must also note that heartbeat (it uses the same interface) begins to produce these messages: Mar 23 19:34:49 nfsa1.mail.back heartbeat[639]: WARN: Late heartbeat: Node 10.0.0.247: interval 14970 ms Mar 23 19:35:27 nfsa2.mail.back heartbeat[601]: WARN: Late heartbeat: Node nfsa1: interval 12410 ms Ethernet is 10/100/1000 baseT (tg3) on Dell 1750's. When I kill the script that updates quota the problem goes away, SyncAll completes successfully and continues to work in Connected mode. This is an excerpt from drbd.conf: protocol=C [...] net { sync-max=128M sync-min=32M tl-size=512 sync-nice=0 } I am including a log starting from the moment when secondary is first connected. It's rather long, sorry. Note that from 15:10 to 16:42 the system worked normally. Mar 23 13:57:34 nfsa2.mail.back drbd: ===> drbd start <=== Mar 23 13:57:34 nfsa2.mail.back drbd: modprobe -s drbd minor_count=1 Mar 23 13:57:34 nfsa2.mail.back kernel: drbd: initialised. Version: 0.6.12 (api:64/proto:62) Mar 23 13:57:35 nfsa2.mail.back drbd: drbdsetup /dev/nb0 disk /dev/sda1 --do-panic --disk-size=214291003k Mar 23 13:57:35 nfsa2.mail.back kernel: drbd0: Creating state file Mar 23 13:57:35 nfsa2.mail.back kernel: "/var/lib/drbd/drbd0" Mar 23 13:57:35 nfsa2.mail.back drbd: drbdsetup /dev/nb0 net 192.168.1.2:7789 192.168.1.1:7789 C --sync-max=128M --sync-min=32M --tl-size=512 --sync-nice=0 Mar 23 13:57:35 nfsa2.mail.back drbd: drbdsetup /dev/nb0 wait_connect -t 120 Mar 23 13:57:35 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 13:57:35 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 13:57:35 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15 Mar 23 13:57:35 nfsa2.mail.back drbd: 'drbd0' SyncingAll, waiting for this to finish Mar 23 13:57:35 nfsa2.mail.back drbd: drbdsetup /dev/nb0 wait_sync Mar 23 15:10:47 nfsa1.mail.back kernel: drbd0: Synchronisation done. Mar 23 15:10:51 nfsa2.mail.back drbd: 'drbd0' SyncingAll finished, issue drbdsetup /dev/nb0 secondary_remote Mar 23 16:27:48 nfsa1.mail.back kernel: drbd0: [kjournald/885] sock_sendmsg time expired, ko = 4294967295 Mar 23 16:50:40 nfsa1.mail.back kernel: drbd0: [kjournald/885] sock_sendmsg time expired, ko = 4294967295 Mar 23 16:50:40 nfsa1.mail.back last message repeated 1 time Mar 23 17:05:06 nfsa1.mail.back kernel: drbd0: [kjournald/885] sock_sendmsg time expired, ko = 4294967295 Mar 23 17:42:25 nfsa2.mail.back kernel: drbd0: Connection lost. Mar 23 17:42:26 nfsa1.mail.back kernel: drbd0: Connection lost. Mar 23 17:42:26 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 17:42:26 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 17:42:26 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15 Mar 23 17:43:08 nfsa1.mail.back kernel: drbd0: Synchronisation done. Mar 23 17:43:39 nfsa1.mail.back kernel: drbd0: [kjournald/885] sock_sendmsg time expired, ko = 4294967295 Mar 23 17:44:13 nfsa2.mail.back kernel: drbd0: Connection lost. Mar 23 17:44:13 nfsa1.mail.back kernel: drbd0: [kjournald/885] sock_sendmsg returned -32 Mar 23 17:44:13 nfsa1.mail.back kernel: drbd0: Connection lost. Mar 23 17:44:13 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 17:44:13 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 17:44:13 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15 Mar 23 17:45:00 nfsa2.mail.back kernel: drbd0: Connection lost. Mar 23 17:45:00 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/11349] sock_sendmsg returned -32 Mar 23 17:45:00 nfsa1.mail.back kernel: drbd0: Syncer send failed. Mar 23 17:45:00 nfsa1.mail.back kernel: drbd0: Connection lost. Mar 23 17:45:00 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 17:45:00 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 17:45:00 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15 Mar 23 17:47:11 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 17:56:40 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 17:56:40 nfsa1.mail.back last message repeated 1 time Mar 23 17:58:49 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294 Mar 23 18:00:54 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 18:00:54 nfsa1.mail.back last message repeated 1 time Mar 23 18:20:17 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 18:20:17 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294 Mar 23 18:20:48 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 18:20:48 nfsa1.mail.back last message repeated 1 time Mar 23 18:25:28 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294 Mar 23 18:32:34 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 18:32:34 nfsa1.mail.back last message repeated 1 time Mar 23 18:37:16 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294 Mar 23 18:37:51 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 18:37:51 nfsa1.mail.back last message repeated 1 time Mar 23 18:39:02 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294 Mar 23 18:39:36 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 18:39:36 nfsa1.mail.back last message repeated 1 time Mar 23 18:43:12 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294 Mar 23 18:45:38 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 18:45:38 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294 Mar 23 18:49:15 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 18:49:15 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294 Mar 23 18:51:41 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 18:51:41 nfsa1.mail.back last message repeated 1 time Mar 23 18:53:31 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294 Mar 23 19:08:40 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 19:08:40 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294 Mar 23 19:11:49 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 19:11:49 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294 Mar 23 19:13:07 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 19:13:07 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294 Mar 23 19:13:43 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 19:13:43 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294 Mar 23 19:15:00 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 19:15:00 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294 Mar 23 19:16:13 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 19:16:13 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294 Mar 23 19:16:50 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 19:16:50 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294 Mar 23 19:18:03 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295 Mar 23 19:18:44 nfsa1.mail.back kernel: drbd0: Synchronisation done. Mar 23 19:19:19 nfsa2.mail.back kernel: drbd0: Connection lost. Mar 23 19:19:20 nfsa1.mail.back kernel: drbd0: Connection lost. Mar 23 19:19:20 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 19:19:20 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 19:19:20 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15 Mar 23 19:20:06 nfsa2.mail.back kernel: drbd0: Connection lost. Mar 23 19:20:07 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/25303] sock_sendmsg returned -32 Mar 23 19:20:07 nfsa1.mail.back kernel: drbd0: Syncer send failed. Mar 23 19:20:07 nfsa1.mail.back kernel: drbd0: Connection lost. Mar 23 19:20:07 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 19:20:07 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 19:20:07 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15 Mar 23 19:24:47 nfsa2.mail.back kernel: drbd0: Connection lost. Mar 23 19:24:48 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/32327] sock_sendmsg returned -32 Mar 23 19:24:48 nfsa1.mail.back kernel: drbd0: Syncer send failed. Mar 23 19:24:48 nfsa1.mail.back kernel: drbd0: Connection lost. Mar 23 19:24:48 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 19:24:48 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 19:24:48 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15 Mar 23 19:26:47 nfsa2.mail.back kernel: drbd0: Connection lost. Mar 23 19:26:48 nfsa1.mail.back kernel: drbd0: Syncer send failed. Mar 23 19:26:48 nfsa1.mail.back kernel: drbd0: Connection lost. Mar 23 19:26:48 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 19:26:48 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 19:26:48 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15 Mar 23 19:30:06 nfsa2.mail.back kernel: drbd0: Connection lost. Mar 23 19:30:06 nfsa1.mail.back kernel: drbd0: Syncer aborted. Mar 23 19:30:06 nfsa1.mail.back kernel: drbd0: Connection lost. Mar 23 19:30:07 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 19:30:07 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 19:30:07 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15 Mar 23 19:34:05 nfsa2.mail.back kernel: drbd0: Connection lost. Mar 23 19:34:06 nfsa1.mail.back kernel: drbd0: Syncer aborted. Mar 23 19:34:06 nfsa1.mail.back kernel: drbd0: Connection lost. Mar 23 19:34:06 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 19:34:06 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 19:34:06 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15 Mar 23 19:34:47 nfsa2.mail.back kernel: drbd0: Connection lost. Mar 23 19:34:48 nfsa1.mail.back kernel: drbd0: Syncer aborted. Mar 23 19:34:48 nfsa1.mail.back kernel: drbd0: Connection lost. Mar 23 19:34:48 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 19:34:48 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B Mar 23 19:34:48 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20040323/66d480e1/attachment.pgp>