Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello and sorry about the length of this report, I'm the administrator of a cluster which consists of two nodes : ldap-a and ldap-b. Both nodes are identical and run on : -Debian Etch (up to date) -DRBD 8.2.6 -Heartbeat 2.1.3-5~bpo40+1 from etch-backports, patched to fix this bug : http://hg.linux-ha.org/dev/rev/47f60bebe7b2 -EXT3 filesystem I have edited /etc/crontab on ldap-a (which is usually the primary for drbd0) to check for out-of-sync errors every sunday : --------8<---------------------------------------------------------------------- 20 05 * * 0 root /sbin/drbdadm verify all -----------------------------------------------------------------------8<------- ('all' means drbd0 and drbd1) This system has been in production for about three weeks. On the first sunday, out-of-sync sectors were detected nd one node completely crashed at the end of the check, but I found out that the system had been running with a corrupted filesystem for a few days... So I repaired the filesystem, restored the files from a backup, synchronized the nodes, and verified that 'drbdadm verify all' did not complain. On the second sunday, nothing unusual was found. On the third sunday, I get this in /var/log/kern.log on ldap-a : --------8<----------------------------------------------------------------------- Jun 22 05:20:01 ldap-a kernel: drbd0: conn( Connected -> VerifyS ) Jun 22 05:20:01 ldap-a kernel: drbd1: conn( Connected -> VerifyS ) Jun 22 05:21:49 ldap-a kernel: drbd0: Out of sync: start=7094280, size=8 (sectors) Jun 22 05:21:58 ldap-a kernel: drbd0: Out of sync: start=7667800, size=8 (sectors) Jun 22 05:25:03 ldap-a kernel: drbd1: Online verify done (total 301 sec; paused 0 sec; 32448 K/sec) Jun 22 05:25:03 ldap-a kernel: drbd1: conn( VerifyS -> Connected ) Jun 22 05:25:03 ldap-a kernel: drbd0: Online verify done (total 301 sec; paused 0 sec; 32448 K/sec) Jun 22 05:25:03 ldap-a kernel: drbd0: Online verify found 2 4k block out of sync! Jun 22 05:25:03 ldap-a kernel: drbd0: helper command: /sbin/drbdadm out-of-sync Jun 22 05:25:03 ldap-a kernel: drbd0: conn( VerifyS -> Connected ) Jun 22 05:25:03 ldap-a kernel: drbd0: Writing the whole bitmap, due to failed kmalloc Jun 22 05:25:03 ldap-a kernel: drbd0: writing of bitmap took 0 jiffies Jun 22 05:25:03 ldap-a kernel: drbd0: 8 KB (2 bits) marked out-of-sync by on disk bit-map. -------------------------------------------------------------------------8<------- drbd0 is corrupted ! From reading the manpage and the forums, it seems that false positives could occur with a swap or reiserfs filesystem, unfortunately drbd0 and drbd1 use ext3, so I'm not sure what to make of this... => So here is what I did, and what I'd like you to comment on : First, I make sure the sectors are indeed desynchronized : ========================================================== ldap-a:~# dd iflag=direct bs=512 skip=7094280 count=8 if=/dev/sda7 | xxd -a > ldap-a-drbd0-7094280.dump ldap-a:~# dd iflag=direct bs=512 skip=7667800 count=8 if=/dev/sda7 | xxd -a > ldap-a-drbd0-7667800.dump ldap-b:~# dd iflag=direct bs=512 skip=7094280 count=8 if=/dev/sda7 | xxd -a > ldap-b-drbd0-7094280.dump ldap-b:~# dd iflag=direct bs=512 skip=7667800 count=8 if=/dev/sda7 | xxd -a > ldap-b-drbd0-7667800.dump ldap-b:~# scp ldap-b-drbd0-7094280.dump ldap-a:/root/ ldap-b:~# scp ldap-b-drbd0-7667800.dump ldap-a:/root/ ldap-a:~# diff ldap-a-drbd0-7094280.dump ldap-b-drbd0-7094280.dump 8c8 < 0000f10: 1c01 0000 0d36 9d02 0000 0000 0000 0000 .....6.......... --- > 0000f10: 1c01 0000 5d36 9d02 0000 0000 0000 0000 ....]6.......... ldap-a:~# diff ldap-a-drbd0-7667800.dump ldap-b-drbd0-7667800.dump 248c248 < 0000f70: 0380 0000 c01e ea13 0000 0000 259c 9902 ............%... --- > 0000f70: 0380 0000 c01e ea13 0000 0000 3f9c 9902 ............?... 253c253 < 0000fc0: 0380 0000 55f8 9902 0400 0000 a657 ef03 ....U........W.. --- > 0000fc0: 0380 0000 6ff8 9902 0400 0000 c857 ef03 ....o........W.. 256c256 < 0000ff0: 0000 0000 0000 0000 11be 9902 0000 0000 ................ --- > 0000ff0: 0000 0000 0000 0000 2bbe 9902 0000 0000 ........+....... Then I check whether these sectors are in use or if they are unallocated : ==================================================================================== I think a sector is 512 bytes and a cluster 4096 bytes, so : -sector 7094280 corresponds to cluster 7094280/8=886785 (or maybe 886784 or 886786 ?). -sector 7667800 corresponds to cluster 7667800/8=958475 (or maybe 958474 or 958476 ?). ldap-a:~# dumpe2fs /dev/sda7 dumpe2fs 1.40-WIP (14-Nov-2006) Filesystem volume name: <none> Last mounted on: <not available> Filesystem UUID: 2c845fa3-b844-4489-b16b-0e7714fa31fd Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file Filesystem flags: signed directory hash Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 1221600 Block count: 2441872 Reserved block count: 122093 Free blocks: 2264519 Free inodes: 1220902 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 596 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 16288 Inode blocks per group: 509 Filesystem created: Sat Apr 26 13:06:24 2008 Last mount time: Fri Jun 13 11:20:33 2008 Last write time: Fri Jun 13 11:20:33 2008 Mount count: 2 Maximum mount count: 20 Last checked: Fri Jun 13 10:09:54 2008 Check interval: 15552000 (6 months) Next check after: Wed Dec 10 09:09:54 2008 Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode: 8 Default directory hash: tea Directory Hash Seed: 1983fd4e-57ac-4c6b-8112-00710cd11b6a Journal backup: inode blocks Taille du journal: 128M (...) Groupe 27 : (Blocs 884736-917503) superbloc Secours à 884736, Descripteurs de groupes à 884737-884737 Blocs réservés GDT à 884738-885333 Bitmap de blocs à 885334 (+598), Bitmap d'i-noeuds à 885335 (+599) Table d'i-noeuds à 885336-885844 (+600) 18811 blocs libres, 16153 i-noeuds libres, 7 répertoires Blocs libres : 885900-890879, 890885-890888, 890890-890902, 890909-890911, 890999, 891002-891004, 891009, 891013-891015, 891017, 891019-891025, 891049, 891054, 896192-905215, 905224, 905229, 905231, 905239-905240, 905245, 906472-906478, 906480-906486, 906488-906494, 906496-906502, 906504-906586, 906715-907263, 907265-909311, 909314-909319, 909324-909327, 909331-909335, 909339-909343, 909346-911359, 911362-911383 I-noeuds libres : 439909-439912, 439916-456064 Groupe 28 : (Blocs 917504-950271) Bitmap de blocs à 917504 (+0), Bitmap d'i-noeuds à 917505 (+1) Table d'i-noeuds à 917506-918014 (+2) 3649 blocs libres, 16288 i-noeuds libres, 0 répertoires Blocs libres : 946623-950271 I-noeuds libres : 456065-472352 Groupe 29 : (Blocs 950272-983039) Bitmap de blocs à 950272 (+0), Bitmap d'i-noeuds à 950273 (+1) Table d'i-noeuds à 950274-950782 (+2) 32257 blocs libres, 16288 i-noeuds libres, 0 répertoires Blocs libres : 950783-983039 I-noeuds libres : 472353-488640 Groupe 30 : (Blocs 983040-1015807) Bitmap de blocs à 983040 (+0), Bitmap d'i-noeuds à 983041 (+1) Table d'i-noeuds à 983042-983550 (+2) 32257 blocs libres, 16288 i-noeuds libres, 0 répertoires Blocs libres : 983551-1015807 I-noeuds libres : 488641-504928 Groupe 31 : (Blocs 1015808-1048575) Bitmap de blocs à 1015808 (+0), Bitmap d'i-noeuds à 1015809 (+1) Table d'i-noeuds à 1015810-1016318 (+2) 32257 blocs libres, 16288 i-noeuds libres, 0 répertoires Blocs libres : 1016319-1048575 I-noeuds libres : 504929-521216 (...) => So it seems these sectors are unallocated... Then, a few hours later, I check again : ======================================== ldap-a:~# dumpe2fs /dev/sda7 dumpe2fs 1.40-WIP (14-Nov-2006) Filesystem volume name: <none> Last mounted on: <not available> Filesystem UUID: 2c845fa3-b844-4489-b16b-0e7714fa31fd Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file Filesystem flags: signed directory hash Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 1221600 Block count: 2441872 Reserved block count: 122093 Free blocks: 2264519 Free inodes: 1220902 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 596 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 16288 Inode blocks per group: 509 Filesystem created: Sat Apr 26 13:06:24 2008 Last mount time: Fri Jun 13 11:20:33 2008 Last write time: Fri Jun 13 11:20:33 2008 Mount count: 2 Maximum mount count: 20 Last checked: Fri Jun 13 10:09:54 2008 Check interval: 15552000 (6 months) Next check after: Wed Dec 10 09:09:54 2008 Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode: 8 Default directory hash: tea Directory Hash Seed: 1983fd4e-57ac-4c6b-8112-00710cd11b6a Journal backup: inode blocks Taille du journal: 128M (...) Groupe 27 : (Blocs 884736-917503) superbloc Secours à 884736, Descripteurs de groupes à 884737-884737 Blocs réservés GDT à 884738-885333 Bitmap de blocs à 885334 (+598), Bitmap d'i-noeuds à 885335 (+599) Table d'i-noeuds à 885336-885844 (+600) 17911 blocs libres, 16153 i-noeuds libres, 7 répertoires Blocs libres : 886800-890879, 890885-890888, 890890-890902, 890909-890911, 890999, 891002-891004, 891009, 891013-891015, 891017, 891019-891025, 891049, 891054, 896192-905215, 905224, 905229, 905231, 905239-905240, 905245, 906472-906478, 906480-906486, 906488-906494, 906496-906502, 906504-906586, 906715-907263, 907265-909311, 909314-909319, 909324-909327, 909331-909335, 909339-909343, 909346-911359, 911362-911383 I-noeuds libres : 439909-439912, 439916-456064 Groupe 28 : (Blocs 917504-950271) Bitmap de blocs à 917504 (+0), Bitmap d'i-noeuds à 917505 (+1) Table d'i-noeuds à 917506-918014 (+2) 3649 blocs libres, 16288 i-noeuds libres, 0 répertoires Blocs libres : 946623-950271 I-noeuds libres : 456065-472352 Groupe 29 : (Blocs 950272-983039) Bitmap de blocs à 950272 (+0), Bitmap d'i-noeuds à 950273 (+1) Table d'i-noeuds à 950274-950782 (+2) 32257 blocs libres, 16288 i-noeuds libres, 0 répertoires Blocs libres : 950783-983039 I-noeuds libres : 472353-488640 Groupe 30 : (Blocs 983040-1015807) Bitmap de blocs à 983040 (+0), Bitmap d'i-noeuds à 983041 (+1) Table d'i-noeuds à 983042-983550 (+2) 32257 blocs libres, 16288 i-noeuds libres, 0 répertoires Blocs libres : 983551-1015807 I-noeuds libres : 488641-504928 Groupe 31 : (Blocs 1015808-1048575) Bitmap de blocs à 1015808 (+0), Bitmap d'i-noeuds à 1015809 (+1) Table d'i-noeuds à 1015810-1016318 (+2) 32257 blocs libres, 16288 i-noeuds libres, 0 répertoires Blocs libres : 1016319-1048575 I-noeuds libres : 504929-521216 (...) => Now, one sector is allocated. I run drbdadm verify all, and we can see that it has been synchronized in the logs: -in kern.log on ldap-a : --------8<----------------------------------------------------------------------- Jun 23 15:42:21 ldap-a kernel: drbd0: conn( Connected -> VerifyS ) Jun 23 15:42:21 ldap-a kernel: drbd1: conn( Connected -> VerifyS ) Jun 23 15:44:20 ldap-a kernel: drbd0: Out of sync: start=7667800, size=8 (sectors) Jun 23 16:05:44 ldap-a kernel: klogd 1.4.1#18, log source = /proc/kmsg started. Jun 23 16:05:44 ldap-a kernel: Linux version 2.6.18-6-686-bigmem (Debian 2.6.18.dfsg.1-18etch6) (dannf at debian.org) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Fri Jun 6 23:31:15 UTC 2008 Jun 23 16:05:44 ldap-a kernel: BIOS-provided physical RAM map: Jun 23 16:05:44 ldap-a kernel: BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) Jun 23 16:05:44 ldap-a kernel: BIOS-e820: 0000000000100000 - 00000000cfb50000 (usable) Jun 23 16:05:44 ldap-a kernel: BIOS-e820: 00000000cfb50000 - 00000000cfb66000 (reserved) Jun 23 16:05:44 ldap-a kernel: BIOS-e820: 00000000cfb66000 - 00000000cfb85c00 (ACPI data) Jun 23 16:05:44 ldap-a kernel: BIOS-e820: 00000000cfb85c00 - 00000000d0000000 (reserved) Jun 23 16:05:44 ldap-a kernel: BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) Jun 23 16:05:44 ldap-a kernel: BIOS-e820: 00000000fe000000 - 0000000100000000 (reserved) Jun 23 16:05:44 ldap-a kernel: BIOS-e820: 0000000100000000 - 0000000130000000 (usable) Jun 23 16:05:44 ldap-a kernel: 3968MB HIGHMEM available. Jun 23 16:05:44 ldap-a kernel: 896MB LOWMEM available. Jun 23 16:05:44 ldap-a kernel: found SMP MP-table at 000fe710 Jun 23 16:05:44 ldap-a kernel: NX (Execute Disable) protection: active Jun 23 16:05:44 ldap-a kernel: On node 0 totalpages: 1245184 Jun 23 16:05:44 ldap-a kernel: DMA zone: 4096 pages, LIFO batch:0 (...) -------------------------------------------------------------------------8<------- ldap-a completely freezed again while checking for out-of-sync errors !! -in kern.log on ldap-b : --------8<----------------------------------------------------------------------- Jun 23 15:42:21 ldap-b kernel: drbd0: conn( Connected -> VerifyT ) Jun 23 15:42:21 ldap-b kernel: drbd1: conn( Connected -> VerifyT ) Jun 23 15:44:20 ldap-b kernel: drbd0: Out of sync: start=7667800, size=8 (sectors) Jun 23 15:47:23 ldap-b kernel: drbd0: Online verify done (total 302 sec; paused 0 sec; 32340 K/sec) Jun 23 15:47:23 ldap-b kernel: drbd0: Online verify found 2 4k block out of sync! Jun 23 15:47:23 ldap-b kernel: drbd0: helper command: /sbin/drbdadm out-of-sync Jun 23 15:47:23 ldap-b kernel: drbd0: conn( VerifyT -> Connected ) Jun 23 15:47:23 ldap-b kernel: drbd0: Writing the whole bitmap, due to failed kmalloc Jun 23 15:47:23 ldap-b kernel: drbd0: writing of bitmap took 0 jiffies Jun 23 15:47:23 ldap-b kernel: drbd0: 8 KB (2 bits) marked out-of-sync by on disk bit-map. Jun 23 15:47:34 ldap-b kernel: drbd1: PingAck did not arrive in time. Jun 23 15:47:34 ldap-b kernel: drbd1: peer( Secondary -> Unknown ) conn( VerifyT -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Jun 23 15:47:34 ldap-b kernel: drbd1: asender terminated Jun 23 15:47:34 ldap-b kernel: drbd1: Terminating asender thread Jun 23 15:47:34 ldap-b kernel: drbd1: short read expecting header on sock: r=-512 Jun 23 15:47:34 ldap-b kernel: drbd1: Creating new current UUID Jun 23 15:47:34 ldap-b kernel: drbd1: Writing meta data super block now. Jun 23 15:47:34 ldap-b kernel: drbd1: tl_clear() Jun 23 15:47:34 ldap-b kernel: drbd1: Connection closed Jun 23 15:47:34 ldap-b kernel: drbd1: helper command: /sbin/drbdadm outdate-peer Jun 23 15:47:34 ldap-b kernel: drbd0: PingAck did not arrive in time. Jun 23 15:47:34 ldap-b kernel: drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Jun 23 15:47:34 ldap-b kernel: drbd0: asender terminated Jun 23 15:47:34 ldap-b kernel: drbd0: Terminating asender thread Jun 23 15:47:34 ldap-b kernel: drbd0: short read expecting header on sock: r=-512 Jun 23 15:47:34 ldap-b kernel: drbd0: Writing meta data super block now. Jun 23 15:47:34 ldap-b kernel: drbd0: tl_clear() Jun 23 15:47:34 ldap-b kernel: drbd0: Connection closed Jun 23 15:47:34 ldap-b kernel: drbd0: conn( NetworkFailure -> Unconnected ) Jun 23 15:47:34 ldap-b kernel: drbd0: receiver terminated Jun 23 15:47:34 ldap-b kernel: drbd0: receiver (re)started Jun 23 15:47:35 ldap-b kernel: drbd0: helper command: /sbin/drbdadm outdate-peer Jun 23 15:47:35 ldap-b kernel: drbd1: outdate-peer helper returned 5 (peer is unreachable, assumed to be dead) Jun 23 15:47:35 ldap-b kernel: drbd1: pdsk( DUnknown -> Outdated ) Jun 23 15:47:35 ldap-b kernel: drbd1: Writing meta data super block now. Jun 23 15:47:35 ldap-b kernel: drbd1: conn( NetworkFailure -> Unconnected ) Jun 23 15:47:35 ldap-b kernel: drbd1: receiver terminated Jun 23 15:47:35 ldap-b kernel: drbd1: receiver (re)started Jun 23 15:47:35 ldap-b kernel: drbd1: conn( Unconnected -> WFConnection ) Jun 23 15:47:35 ldap-b kernel: drbd0: outdate-peer helper returned 5 (peer is unreachable, assumed to be dead) Jun 23 15:47:35 ldap-b kernel: drbd0: Creating new current UUID Jun 23 15:47:35 ldap-b kernel: drbd0: Writing meta data super block now. Jun 23 15:47:35 ldap-b kernel: drbd0: conn( Unconnected -> WFConnection ) Jun 23 15:47:36 ldap-b kernel: drbd0: helper command: /sbin/drbdadm outdate-peer Jun 23 15:47:36 ldap-b kernel: drbd0: outdate-peer helper returned 5 (peer is unreachable, assumed to be dead) Jun 23 15:47:36 ldap-b kernel: drbd0: role( Secondary -> Primary ) pdsk( DUnknown -> Outdated ) Jun 23 15:47:36 ldap-b kernel: drbd0: Writing meta data super block now. Jun 23 15:47:37 ldap-b kernel: kjournald starting. Commit interval 5 seconds Jun 23 15:47:37 ldap-b kernel: EXT3 FS on drbd0, internal journal Jun 23 15:47:37 ldap-b kernel: EXT3-fs: recovery complete. Jun 23 15:47:37 ldap-b kernel: EXT3-fs: mounted filesystem with ordered data mode. Jun 23 16:04:25 ldap-b kernel: bnx2: eth1 NIC Link is Down Jun 23 16:04:26 ldap-b kernel: bnx2: eth1 NIC Link is Up, 100 Mbps full duplex, receive & transmit flow control ON Jun 23 16:04:29 ldap-b kernel: bnx2: eth1 NIC Link is Down Jun 23 16:04:32 ldap-b kernel: bnx2: eth1 NIC Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON Jun 23 16:05:44 ldap-b kernel: bnx2: eth1 NIC Link is Down Jun 23 16:05:47 ldap-b kernel: bnx2: eth1 NIC Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON Jun 23 16:05:56 ldap-b kernel: drbd0: Handshake successful: Agreed network protocol version 88 Jun 23 16:05:56 ldap-b kernel: drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC Jun 23 16:05:56 ldap-b kernel: drbd0: conn( WFConnection -> WFReportParams ) Jun 23 16:05:56 ldap-b kernel: drbd0: Starting asender thread (from drbd0_receiver [16396]) Jun 23 16:05:56 ldap-b kernel: drbd1: Handshake successful: Agreed network protocol version 88 Jun 23 16:05:56 ldap-b kernel: drbd1: Peer authenticated using 20 bytes of 'sha1' HMAC Jun 23 16:05:56 ldap-b kernel: drbd1: conn( WFConnection -> WFReportParams ) Jun 23 16:05:56 ldap-b kernel: drbd1: Starting asender thread (from drbd1_receiver [16397]) Jun 23 16:05:56 ldap-b kernel: drbd0: data-integrity-alg: <not-used> Jun 23 16:05:56 ldap-b kernel: drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) Jun 23 16:05:56 ldap-b kernel: drbd0: Writing meta data super block now. Jun 23 16:05:56 ldap-b kernel: drbd1: data-integrity-alg: <not-used> Jun 23 16:05:56 ldap-b kernel: drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) Jun 23 16:05:56 ldap-b kernel: drbd1: Writing meta data super block now. Jun 23 16:05:56 ldap-b kernel: drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Outdated -> Inconsistent ) Jun 23 16:05:56 ldap-b kernel: drbd1: aftr_isp( 0 -> 1 ) Jun 23 16:05:56 ldap-b kernel: drbd0: Began resync as SyncSource (will sync 520196 KB [130049 bits set]). Jun 23 16:05:56 ldap-b kernel: drbd0: Writing meta data super block now. Jun 23 16:05:56 ldap-b kernel: drbd1: peer_isp( 0 -> 1 ) Jun 23 16:05:56 ldap-b kernel: drbd1: conn( WFBitMapS -> PausedSyncS ) pdsk( Outdated -> Inconsistent ) Jun 23 16:05:56 ldap-b kernel: drbd1: Began resync as PausedSyncS (will sync 68 KB [17 bits set]). Jun 23 16:05:56 ldap-b kernel: drbd1: Writing meta data super block now. Jun 23 16:06:05 ldap-b kernel: drbd0: role( Primary -> Secondary ) Jun 23 16:06:05 ldap-b kernel: drbd0: Writing meta data super block now. Jun 23 16:06:05 ldap-b kernel: drbd0: peer( Secondary -> Primary ) Jun 23 16:06:12 ldap-b kernel: drbd0: Resync done (total 15 sec; paused 0 sec; 34676 K/sec) Jun 23 16:06:12 ldap-b kernel: drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) Jun 23 16:06:12 ldap-b kernel: drbd1: aftr_isp( 1 -> 0 ) Jun 23 16:06:12 ldap-b kernel: drbd0: Writing meta data super block now. Jun 23 16:06:12 ldap-b kernel: drbd1: conn( PausedSyncS -> SyncSource ) peer_isp( 1 -> 0 ) Jun 23 16:06:12 ldap-b kernel: drbd1: Syncer continues. Jun 23 16:06:12 ldap-b kernel: drbd1: Resync done (total 16 sec; paused 15 sec; 68 K/sec) Jun 23 16:06:12 ldap-b kernel: drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) Jun 23 16:06:12 ldap-b kernel: drbd1: Writing meta data super block now. -------------------------------------------------------------------------8<------- Should I worry about these out-of-sync alerts, or are they a kind of false-positive ? In that case, isn't there a way to make it more accurate ? Should I disable TCP checksum offloading ? Should I enable 'replication traffic integrity checking' ? ldap-a has crashed twice in three weeks, and both times it was while executing 'drbdadm verify all'. Couldn't that be a bug in DRBD ? Thanks in advance for any comment or suggestion ! Eric