Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2004-03-23 11:49:14 +0300 \ Eugene Crosser: > Please could someone knowledgeble tell me why this may happen? > Under load (local massive quota update by a script plus cpio of a big > tree from an NFS client) synchronization is repeatedly restarted after > ~1% completion: > > Mar 23 11:21:11 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B > Mar 23 11:21:11 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15 > Mar 23 11:22:58 nfsa2.mail.back kernel: drbd0: Connection lost. > Mar 23 11:22:59 nfsa1.mail.back kernel: drbd0: Syncer aborted. > Mar 23 11:22:59 nfsa1.mail.back kernel: drbd0: Connection lost. > Mar 23 11:22:59 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B > Mar 23 11:22:59 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B > Mar 23 11:22:59 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15 > Mar 23 11:23:50 nfsa2.mail.back kernel: drbd0: Connection lost. > Mar 23 11:23:57 nfsa1.mail.back kernel: drbd0: Syncer aborted. > Mar 23 11:23:57 nfsa1.mail.back kernel: drbd0: Connection lost. > Mar 23 11:23:57 nfsa2.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B > Mar 23 11:23:57 nfsa1.mail.back kernel: drbd0: Connection established. size=214291003 KB / blksize=4096 B > Mar 23 11:23:57 nfsa1.mail.back kernel: drbd0: Synchronisation started blks=15 > Mar 23 11:24:54 nfsa2.mail.back kernel: drbd0: Connection lost. > Mar 23 11:25:02 nfsa1.mail.back kernel: drbd0: Syncer aborted. > Mar 23 11:25:02 nfsa1.mail.back kernel: drbd0: Connection lost. > > Am I right in assumption that when load average is high on the master, > and "application" write speed is higher than sync-min, syncer running at > low priority cannot keep pace and disconnects? No. Syncer never disconnects. DRBD notices that its "DrbdPing" packets are no longer answered in time, and disconnects/reconnects. Unfortunately a disrupted SyncAll still is always restarted from the very beginning. > Also, I was getting significant number of messages like these: > > Mar 22 16:29:27 nfsa1.mail.back kernel: drbd0: [kjournald/819] sock_sendmsg timeout count down: ko=4294967295 > Mar 22 16:34:37 nfsa1.mail.back kernel: drbd0: pending_cnt <0 !!! > Mar 22 16:44:02 nfsa1.mail.back kernel: drbd0: [kjournald/819] sock_sendmsg timeout count down: ko=4294967295 > Mar 22 16:58:34 nfsa1.mail.back kernel: drbd0: [kjournald/819] sock_sendmsg timeout count down: ko=4294967295 > Mar 22 17:08:26 nfsa1.mail.back kernel: drbd0: [kjournald/819] sock_sendmsg timeout count down: ko=4294967295 > ... > Mar 22 19:35:09 nfsa1.mail.back kernel: drbd0: [drbd_syncer_0/6603] sock_sendmsg timeout count down: ko=4294967295 > > What do they mean? > The kernel is 2.4.25 (smp, on Xeon), drbd 0.6.10-cvs (now trying > 0.6.12). 0.6.12 should help for some weird behaviour related to the above problem. Whether it fixes the actual problem I cannot say. Please report back. Lars Ellenberg