[DRBD-user] Syncer aborted. Connection lost.

Eugene Crosser crosser at rol.ru
Tue Mar 23 18:10:01 CET 2004


On Tue, 2004-03-23 at 12:28, Lars Ellenberg wrote:

> > Please could someone knowledgeble tell me why this may happen?
> > Under load (local massive quota update by a script plus cpio of a big
> > tree from an NFS client) synchronization is repeatedly restarted after
> > ~1% completion:
[...]
> 0.6.12 should help for some weird behaviour related to the above problem.
> Whether it fixes the actual problem I cannot say. Please report back.

Behavior did not change with 0.6.12.
When secondary is booted, it synchronizes, works well for some time, and
then starts to disconnect and reconnect every few minutes.  At this
time, load average on the master grows two or three times higher than it
was while "normal" operation.

I must also note that heartbeat (it uses the same interface) begins to
produce these messages:

Mar 23 19:34:49 nfsa1.mail.back heartbeat[639]: WARN: Late heartbeat: Node 10.0.0.247: interval 14970 ms
Mar 23 19:35:27 nfsa2.mail.back heartbeat[601]: WARN: Late heartbeat: Node nfsa1: interval 12410 ms

Ethernet is 10/100/1000 baseT (tg3) on Dell 1750's.
When I kill the script that updates quota the problem goes away, SyncAll
completes successfully and continues to work in Connected mode.

This is an excerpt from drbd.conf:

  protocol=C
[...]
  net {
    sync-max=128M
    sync-min=32M
    tl-size=512
    sync-nice=0
  }

I am including a log starting from the moment when secondary is first
connected.  It's rather long, sorry.  Note that from 15:10 to 16:42 the
system worked normally.

Mar 23 13:57:34 nfsa2.mail.back drbd: ===> drbd start <===
Mar 23 13:57:34 nfsa2.mail.back drbd: modprobe -s drbd minor_count=1
Mar 23 13:57:34 nfsa2.mail.back kernel: drbd: initialised. Version: 0.6.12 (api:64/proto:62)
Mar 23 13:57:35 nfsa2.mail.back drbd: drbdsetup /dev/nb0 disk /dev/sda1 --do-panic --disk-size=214291003k
Mar 23 13:57:35 nfsa2.mail.back kernel: drbd0: Creating state file
Mar 23 13:57:35 nfsa2.mail.back kernel: "/var/lib/drbd/drbd0"
Mar 23 13:57:35 nfsa2.mail.back drbd: drbdsetup /dev/nb0 net 192.168.1.2:7789 192.168.1.1:7789 C --sync-max=128M --sync-min=32M --tl-size=512 --sync-nice=0
Mar 23 13:57:35 nfsa2.mail.back drbd: drbdsetup /dev/nb0 wait_connect -t 120
Mar 23 13:57:35 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 13:57:35 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 13:57:35 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15
Mar 23 13:57:35 nfsa2.mail.back drbd: 'drbd0' SyncingAll, waiting for this to finish
Mar 23 13:57:35 nfsa2.mail.back drbd: drbdsetup /dev/nb0 wait_sync
Mar 23 15:10:47 nfsa1.mail.back kernel: drbd0: Synchronisation done.
Mar 23 15:10:51 nfsa2.mail.back drbd: 'drbd0' SyncingAll finished, issue drbdsetup /dev/nb0 secondary_remote
Mar 23 16:27:48 nfsa1.mail.back kernel: drbd0: [kjournald/885] sock_sendmsg time expired, ko = 4294967295
Mar 23 16:50:40 nfsa1.mail.back kernel: drbd0: [kjournald/885] sock_sendmsg time expired, ko = 4294967295
Mar 23 16:50:40 nfsa1.mail.back last message repeated 1 time
Mar 23 17:05:06 nfsa1.mail.back kernel: drbd0: [kjournald/885] sock_sendmsg time expired, ko = 4294967295
Mar 23 17:42:25 nfsa2.mail.back kernel: drbd0: Connection lost.
Mar 23 17:42:26 nfsa1.mail.back kernel: drbd0: Connection lost.
Mar 23 17:42:26 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 17:42:26 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 17:42:26 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15
Mar 23 17:43:08 nfsa1.mail.back kernel: drbd0: Synchronisation done.
Mar 23 17:43:39 nfsa1.mail.back kernel: drbd0: [kjournald/885] sock_sendmsg time expired, ko = 4294967295
Mar 23 17:44:13 nfsa2.mail.back kernel: drbd0: Connection lost.
Mar 23 17:44:13 nfsa1.mail.back kernel: drbd0: [kjournald/885] sock_sendmsg returned -32
Mar 23 17:44:13 nfsa1.mail.back kernel: drbd0: Connection lost.
Mar 23 17:44:13 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 17:44:13 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 17:44:13 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15
Mar 23 17:45:00 nfsa2.mail.back kernel: drbd0: Connection lost.
Mar 23 17:45:00 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/11349] sock_sendmsg returned -32
Mar 23 17:45:00 nfsa1.mail.back kernel: drbd0: Syncer send failed.
Mar 23 17:45:00 nfsa1.mail.back kernel: drbd0: Connection lost.
Mar 23 17:45:00 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 17:45:00 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 17:45:00 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15
Mar 23 17:47:11 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 17:56:40 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 17:56:40 nfsa1.mail.back last message repeated 1 time
Mar 23 17:58:49 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294
Mar 23 18:00:54 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 18:00:54 nfsa1.mail.back last message repeated 1 time
Mar 23 18:20:17 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 18:20:17 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294
Mar 23 18:20:48 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 18:20:48 nfsa1.mail.back last message repeated 1 time
Mar 23 18:25:28 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294
Mar 23 18:32:34 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 18:32:34 nfsa1.mail.back last message repeated 1 time
Mar 23 18:37:16 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294
Mar 23 18:37:51 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 18:37:51 nfsa1.mail.back last message repeated 1 time
Mar 23 18:39:02 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294
Mar 23 18:39:36 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 18:39:36 nfsa1.mail.back last message repeated 1 time
Mar 23 18:43:12 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294
Mar 23 18:45:38 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 18:45:38 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294
Mar 23 18:49:15 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 18:49:15 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294
Mar 23 18:51:41 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 18:51:41 nfsa1.mail.back last message repeated 1 time
Mar 23 18:53:31 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294
Mar 23 19:08:40 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 19:08:40 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294
Mar 23 19:11:49 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 19:11:49 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294
Mar 23 19:13:07 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 19:13:07 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294
Mar 23 19:13:43 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 19:13:43 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294
Mar 23 19:15:00 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 19:15:00 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294
Mar 23 19:16:13 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 19:16:13 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294
Mar 23 19:16:50 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 19:16:50 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967294
Mar 23 19:18:03 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/18366] sock_sendmsg time expired, ko = 4294967295
Mar 23 19:18:44 nfsa1.mail.back kernel: drbd0: Synchronisation done.
Mar 23 19:19:19 nfsa2.mail.back kernel: drbd0: Connection lost.
Mar 23 19:19:20 nfsa1.mail.back kernel: drbd0: Connection lost.
Mar 23 19:19:20 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 19:19:20 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 19:19:20 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15
Mar 23 19:20:06 nfsa2.mail.back kernel: drbd0: Connection lost.
Mar 23 19:20:07 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/25303] sock_sendmsg returned -32
Mar 23 19:20:07 nfsa1.mail.back kernel: drbd0: Syncer send failed.
Mar 23 19:20:07 nfsa1.mail.back kernel: drbd0: Connection lost.
Mar 23 19:20:07 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 19:20:07 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 19:20:07 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15
Mar 23 19:24:47 nfsa2.mail.back kernel: drbd0: Connection lost.
Mar 23 19:24:48 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/32327] sock_sendmsg returned -32
Mar 23 19:24:48 nfsa1.mail.back kernel: drbd0: Syncer send failed.
Mar 23 19:24:48 nfsa1.mail.back kernel: drbd0: Connection lost.
Mar 23 19:24:48 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 19:24:48 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 19:24:48 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15
Mar 23 19:26:47 nfsa2.mail.back kernel: drbd0: Connection lost.
Mar 23 19:26:48 nfsa1.mail.back kernel: drbd0: Syncer send failed.
Mar 23 19:26:48 nfsa1.mail.back kernel: drbd0: Connection lost.
Mar 23 19:26:48 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 19:26:48 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 19:26:48 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15
Mar 23 19:30:06 nfsa2.mail.back kernel: drbd0: Connection lost.
Mar 23 19:30:06 nfsa1.mail.back kernel: drbd0: Syncer aborted.
Mar 23 19:30:06 nfsa1.mail.back kernel: drbd0: Connection lost.
Mar 23 19:30:07 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 19:30:07 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 19:30:07 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15
Mar 23 19:34:05 nfsa2.mail.back kernel: drbd0: Connection lost.
Mar 23 19:34:06 nfsa1.mail.back kernel: drbd0: Syncer aborted.
Mar 23 19:34:06 nfsa1.mail.back kernel: drbd0: Connection lost.
Mar 23 19:34:06 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 19:34:06 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 19:34:06 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15
Mar 23 19:34:47 nfsa2.mail.back kernel: drbd0: Connection lost.
Mar 23 19:34:48 nfsa1.mail.back kernel: drbd0: Syncer aborted.
Mar 23 19:34:48 nfsa1.mail.back kernel: drbd0: Connection lost.
Mar 23 19:34:48 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 19:34:48 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B
Mar 23 19:34:48 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.linbit.com/pipermail/drbd-user/attachments/20040323/66d480e1/attachment.pgp 


More information about the drbd-user mailing list