[DRBD-user] drbdadm verify all seems to produce false positives on ext3 and crash the server

Eric Marin eric.marin at utc.fr
Mon Jun 23 16:32:25 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello and sorry about the length of this report,

I'm the administrator of a cluster which consists of two nodes : ldap-a 
and ldap-b.

Both nodes are identical and run on :
-Debian Etch (up to date)
-DRBD 8.2.6
-Heartbeat 2.1.3-5~bpo40+1 from etch-backports, patched to fix this bug 
: http://hg.linux-ha.org/dev/rev/47f60bebe7b2
-EXT3 filesystem

I have edited /etc/crontab on ldap-a (which is usually the primary for 
drbd0) to check for out-of-sync errors every sunday :
--------8<----------------------------------------------------------------------
20 05   * * 0   root    /sbin/drbdadm verify all
-----------------------------------------------------------------------8<-------
('all' means drbd0 and drbd1)

This system has been in production for about three weeks.



On the first sunday, out-of-sync sectors were detected nd one node 
completely crashed at the end of the check, but I found out that the 
system had been running with a corrupted filesystem for a few days... So 
I repaired the filesystem, restored the files from a backup, 
synchronized the nodes, and verified that 'drbdadm verify all' did not 
complain.

On the second sunday, nothing unusual was found.

On the third sunday, I get this in /var/log/kern.log on ldap-a :
--------8<-----------------------------------------------------------------------
Jun 22 05:20:01 ldap-a kernel: drbd0: conn( Connected -> VerifyS )
Jun 22 05:20:01 ldap-a kernel: drbd1: conn( Connected -> VerifyS )
Jun 22 05:21:49 ldap-a kernel: drbd0: Out of sync: start=7094280, size=8 
(sectors)
Jun 22 05:21:58 ldap-a kernel: drbd0: Out of sync: start=7667800, size=8 
(sectors)
Jun 22 05:25:03 ldap-a kernel: drbd1: Online verify  done (total 301 
sec; paused 0 sec; 32448 K/sec)
Jun 22 05:25:03 ldap-a kernel: drbd1: conn( VerifyS -> Connected )
Jun 22 05:25:03 ldap-a kernel: drbd0: Online verify  done (total 301 
sec; paused 0 sec; 32448 K/sec)
Jun 22 05:25:03 ldap-a kernel: drbd0: Online verify found 2 4k block out 
of sync!
Jun 22 05:25:03 ldap-a kernel: drbd0: helper command: /sbin/drbdadm 
out-of-sync
Jun 22 05:25:03 ldap-a kernel: drbd0: conn( VerifyS -> Connected )
Jun 22 05:25:03 ldap-a kernel: drbd0: Writing the whole bitmap, due to 
failed kmalloc
Jun 22 05:25:03 ldap-a kernel: drbd0: writing of bitmap took 0 jiffies
Jun 22 05:25:03 ldap-a kernel: drbd0: 8 KB (2 bits) marked out-of-sync 
by on disk bit-map.
-------------------------------------------------------------------------8<-------

drbd0 is corrupted !

 From reading the manpage and the forums, it seems that false positives 
could occur with a swap or reiserfs filesystem, unfortunately drbd0 and 
drbd1 use ext3, so I'm not sure what to make of this...


=> So here is what I did, and what I'd like you to comment on :

First, I make sure the sectors are indeed desynchronized :
==========================================================
ldap-a:~# dd iflag=direct bs=512 skip=7094280 count=8 if=/dev/sda7 | xxd 
-a > ldap-a-drbd0-7094280.dump
ldap-a:~# dd iflag=direct bs=512 skip=7667800 count=8 if=/dev/sda7 | xxd 
-a > ldap-a-drbd0-7667800.dump
ldap-b:~# dd iflag=direct bs=512 skip=7094280 count=8 if=/dev/sda7 | xxd 
-a > ldap-b-drbd0-7094280.dump
ldap-b:~# dd iflag=direct bs=512 skip=7667800 count=8 if=/dev/sda7 | xxd 
-a > ldap-b-drbd0-7667800.dump
ldap-b:~# scp ldap-b-drbd0-7094280.dump ldap-a:/root/
ldap-b:~# scp ldap-b-drbd0-7667800.dump ldap-a:/root/
ldap-a:~# diff ldap-a-drbd0-7094280.dump ldap-b-drbd0-7094280.dump
8c8
< 0000f10: 1c01 0000 0d36 9d02 0000 0000 0000 0000  .....6..........
---
 > 0000f10: 1c01 0000 5d36 9d02 0000 0000 0000 0000  ....]6..........

ldap-a:~# diff ldap-a-drbd0-7667800.dump ldap-b-drbd0-7667800.dump
248c248
< 0000f70: 0380 0000 c01e ea13 0000 0000 259c 9902  ............%...
---
 > 0000f70: 0380 0000 c01e ea13 0000 0000 3f9c 9902  ............?...
253c253
< 0000fc0: 0380 0000 55f8 9902 0400 0000 a657 ef03  ....U........W..
---
 > 0000fc0: 0380 0000 6ff8 9902 0400 0000 c857 ef03  ....o........W..
256c256
< 0000ff0: 0000 0000 0000 0000 11be 9902 0000 0000  ................
---
 > 0000ff0: 0000 0000 0000 0000 2bbe 9902 0000 0000  ........+.......

Then I check whether these sectors are in use or if they are unallocated :
====================================================================================
I think a sector is 512 bytes and a cluster 4096 bytes, so :
-sector 7094280 corresponds to cluster 7094280/8=886785 (or maybe 886784 
or 886786 ?).
-sector 7667800 corresponds to cluster 7667800/8=958475 (or maybe 958474 
or 958476 ?).

ldap-a:~# dumpe2fs /dev/sda7
dumpe2fs 1.40-WIP (14-Nov-2006)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          2c845fa3-b844-4489-b16b-0e7714fa31fd
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal resize_inode dir_index filetype 
needs_recovery sparse_super large_file
Filesystem flags:         signed directory hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              1221600
Block count:              2441872
Reserved block count:     122093
Free blocks:              2264519
Free inodes:              1220902
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      596
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16288
Inode blocks per group:   509
Filesystem created:       Sat Apr 26 13:06:24 2008
Last mount time:          Fri Jun 13 11:20:33 2008
Last write time:          Fri Jun 13 11:20:33 2008
Mount count:              2
Maximum mount count:      20
Last checked:             Fri Jun 13 10:09:54 2008
Check interval:           15552000 (6 months)
Next check after:         Wed Dec 10 09:09:54 2008
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal inode:            8
Default directory hash:   tea
Directory Hash Seed:      1983fd4e-57ac-4c6b-8112-00710cd11b6a
Journal backup:           inode blocks
Taille du journal:        128M

(...)

Groupe 27 : (Blocs 884736-917503)
   superbloc Secours à 884736, Descripteurs de groupes à 884737-884737
   Blocs réservés GDT à 884738-885333
   Bitmap de blocs à 885334 (+598), Bitmap d'i-noeuds à 885335 (+599)
   Table d'i-noeuds à 885336-885844 (+600)
   18811 blocs libres, 16153 i-noeuds libres, 7 répertoires
   Blocs libres : 885900-890879, 890885-890888, 890890-890902, 
890909-890911, 890999, 891002-891004, 891009, 891013-891015, 891017, 
891019-891025, 891049, 891054, 896192-905215, 905224, 905229, 905231, 
905239-905240, 905245, 906472-906478, 906480-906486, 906488-906494, 
906496-906502, 906504-906586, 906715-907263, 907265-909311, 
909314-909319, 909324-909327, 909331-909335, 909339-909343, 
909346-911359, 911362-911383
   I-noeuds libres : 439909-439912, 439916-456064
Groupe 28 : (Blocs 917504-950271)
   Bitmap de blocs à 917504 (+0), Bitmap d'i-noeuds à 917505 (+1)
   Table d'i-noeuds à 917506-918014 (+2)
   3649 blocs libres, 16288 i-noeuds libres, 0 répertoires
   Blocs libres : 946623-950271
   I-noeuds libres : 456065-472352
Groupe 29 : (Blocs 950272-983039)
   Bitmap de blocs à 950272 (+0), Bitmap d'i-noeuds à 950273 (+1)
   Table d'i-noeuds à 950274-950782 (+2)
   32257 blocs libres, 16288 i-noeuds libres, 0 répertoires
   Blocs libres : 950783-983039
   I-noeuds libres : 472353-488640
Groupe 30 : (Blocs 983040-1015807)
   Bitmap de blocs à 983040 (+0), Bitmap d'i-noeuds à 983041 (+1)
   Table d'i-noeuds à 983042-983550 (+2)
   32257 blocs libres, 16288 i-noeuds libres, 0 répertoires
   Blocs libres : 983551-1015807
   I-noeuds libres : 488641-504928
Groupe 31 : (Blocs 1015808-1048575)
   Bitmap de blocs à 1015808 (+0), Bitmap d'i-noeuds à 1015809 (+1)
   Table d'i-noeuds à 1015810-1016318 (+2)
   32257 blocs libres, 16288 i-noeuds libres, 0 répertoires
   Blocs libres : 1016319-1048575
   I-noeuds libres : 504929-521216


(...)

=> So it seems these sectors are unallocated...

Then, a few hours later, I check again :
========================================
ldap-a:~# dumpe2fs /dev/sda7
dumpe2fs 1.40-WIP (14-Nov-2006)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          2c845fa3-b844-4489-b16b-0e7714fa31fd
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal resize_inode dir_index filetype 
needs_recovery sparse_super large_file
Filesystem flags:         signed directory hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              1221600
Block count:              2441872
Reserved block count:     122093
Free blocks:              2264519
Free inodes:              1220902
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      596
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16288
Inode blocks per group:   509
Filesystem created:       Sat Apr 26 13:06:24 2008
Last mount time:          Fri Jun 13 11:20:33 2008
Last write time:          Fri Jun 13 11:20:33 2008
Mount count:              2
Maximum mount count:      20
Last checked:             Fri Jun 13 10:09:54 2008
Check interval:           15552000 (6 months)
Next check after:         Wed Dec 10 09:09:54 2008
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal inode:            8
Default directory hash:   tea
Directory Hash Seed:      1983fd4e-57ac-4c6b-8112-00710cd11b6a
Journal backup:           inode blocks
Taille du journal:        128M

(...)

Groupe 27 : (Blocs 884736-917503)
   superbloc Secours à 884736, Descripteurs de groupes à 884737-884737
   Blocs réservés GDT à 884738-885333
   Bitmap de blocs à 885334 (+598), Bitmap d'i-noeuds à 885335 (+599)
   Table d'i-noeuds à 885336-885844 (+600)
   17911 blocs libres, 16153 i-noeuds libres, 7 répertoires
   Blocs libres : 886800-890879, 890885-890888, 890890-890902, 
890909-890911, 890999, 891002-891004, 891009, 891013-891015, 891017, 
891019-891025, 891049, 891054, 896192-905215, 905224, 905229, 905231, 
905239-905240, 905245, 906472-906478, 906480-906486, 906488-906494, 
906496-906502, 906504-906586, 906715-907263, 907265-909311, 
909314-909319, 909324-909327, 909331-909335, 909339-909343, 
909346-911359, 911362-911383
   I-noeuds libres : 439909-439912, 439916-456064
Groupe 28 : (Blocs 917504-950271)
   Bitmap de blocs à 917504 (+0), Bitmap d'i-noeuds à 917505 (+1)
   Table d'i-noeuds à 917506-918014 (+2)
   3649 blocs libres, 16288 i-noeuds libres, 0 répertoires
   Blocs libres : 946623-950271
   I-noeuds libres : 456065-472352
Groupe 29 : (Blocs 950272-983039)
   Bitmap de blocs à 950272 (+0), Bitmap d'i-noeuds à 950273 (+1)
   Table d'i-noeuds à 950274-950782 (+2)
   32257 blocs libres, 16288 i-noeuds libres, 0 répertoires
   Blocs libres : 950783-983039
   I-noeuds libres : 472353-488640
Groupe 30 : (Blocs 983040-1015807)
   Bitmap de blocs à 983040 (+0), Bitmap d'i-noeuds à 983041 (+1)
   Table d'i-noeuds à 983042-983550 (+2)
   32257 blocs libres, 16288 i-noeuds libres, 0 répertoires
   Blocs libres : 983551-1015807
   I-noeuds libres : 488641-504928
Groupe 31 : (Blocs 1015808-1048575)
   Bitmap de blocs à 1015808 (+0), Bitmap d'i-noeuds à 1015809 (+1)
   Table d'i-noeuds à 1015810-1016318 (+2)
   32257 blocs libres, 16288 i-noeuds libres, 0 répertoires
   Blocs libres : 1016319-1048575
   I-noeuds libres : 504929-521216

(...)

=> Now, one sector is allocated. I run drbdadm verify all, and we can 
see that it has been synchronized in the logs:
-in kern.log on ldap-a :
--------8<-----------------------------------------------------------------------
Jun 23 15:42:21 ldap-a kernel: drbd0: conn( Connected -> VerifyS )
Jun 23 15:42:21 ldap-a kernel: drbd1: conn( Connected -> VerifyS )
Jun 23 15:44:20 ldap-a kernel: drbd0: Out of sync: start=7667800, size=8 
(sectors)
Jun 23 16:05:44 ldap-a kernel: klogd 1.4.1#18, log source = /proc/kmsg 
started.
Jun 23 16:05:44 ldap-a kernel: Linux version 2.6.18-6-686-bigmem (Debian 
2.6.18.dfsg.1-18etch6) (dannf at debian.org) (gcc version 4.1.2 20061115 
(prerelease) (Debian 4.1.1-21)) #1 SMP Fri Jun 6 23:31:15 UTC 2008
Jun 23 16:05:44 ldap-a kernel: BIOS-provided physical RAM map:
Jun 23 16:05:44 ldap-a kernel:  BIOS-e820: 0000000000000000 - 
00000000000a0000 (usable)
Jun 23 16:05:44 ldap-a kernel:  BIOS-e820: 0000000000100000 - 
00000000cfb50000 (usable)
Jun 23 16:05:44 ldap-a kernel:  BIOS-e820: 00000000cfb50000 - 
00000000cfb66000 (reserved)
Jun 23 16:05:44 ldap-a kernel:  BIOS-e820: 00000000cfb66000 - 
00000000cfb85c00 (ACPI data)
Jun 23 16:05:44 ldap-a kernel:  BIOS-e820: 00000000cfb85c00 - 
00000000d0000000 (reserved)
Jun 23 16:05:44 ldap-a kernel:  BIOS-e820: 00000000e0000000 - 
00000000f0000000 (reserved)
Jun 23 16:05:44 ldap-a kernel:  BIOS-e820: 00000000fe000000 - 
0000000100000000 (reserved)
Jun 23 16:05:44 ldap-a kernel:  BIOS-e820: 0000000100000000 - 
0000000130000000 (usable)
Jun 23 16:05:44 ldap-a kernel: 3968MB HIGHMEM available.
Jun 23 16:05:44 ldap-a kernel: 896MB LOWMEM available.
Jun 23 16:05:44 ldap-a kernel: found SMP MP-table at 000fe710
Jun 23 16:05:44 ldap-a kernel: NX (Execute Disable) protection: active
Jun 23 16:05:44 ldap-a kernel: On node 0 totalpages: 1245184
Jun 23 16:05:44 ldap-a kernel:   DMA zone: 4096 pages, LIFO batch:0
(...)
-------------------------------------------------------------------------8<-------
ldap-a completely freezed again while checking for out-of-sync errors !!

-in kern.log on ldap-b :
--------8<-----------------------------------------------------------------------
Jun 23 15:42:21 ldap-b kernel: drbd0: conn( Connected -> VerifyT )
Jun 23 15:42:21 ldap-b kernel: drbd1: conn( Connected -> VerifyT )
Jun 23 15:44:20 ldap-b kernel: drbd0: Out of sync: start=7667800, size=8 
(sectors)
Jun 23 15:47:23 ldap-b kernel: drbd0: Online verify  done (total 302 
sec; paused 0 sec; 32340 K/sec)
Jun 23 15:47:23 ldap-b kernel: drbd0: Online verify found 2 4k block out 
of sync!
Jun 23 15:47:23 ldap-b kernel: drbd0: helper command: /sbin/drbdadm 
out-of-sync
Jun 23 15:47:23 ldap-b kernel: drbd0: conn( VerifyT -> Connected )
Jun 23 15:47:23 ldap-b kernel: drbd0: Writing the whole bitmap, due to 
failed kmalloc
Jun 23 15:47:23 ldap-b kernel: drbd0: writing of bitmap took 0 jiffies
Jun 23 15:47:23 ldap-b kernel: drbd0: 8 KB (2 bits) marked out-of-sync 
by on disk bit-map.
Jun 23 15:47:34 ldap-b kernel: drbd1: PingAck did not arrive in time.
Jun 23 15:47:34 ldap-b kernel: drbd1: peer( Secondary -> Unknown ) conn( 
VerifyT -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Jun 23 15:47:34 ldap-b kernel: drbd1: asender terminated
Jun 23 15:47:34 ldap-b kernel: drbd1: Terminating asender thread
Jun 23 15:47:34 ldap-b kernel: drbd1: short read expecting header on 
sock: r=-512
Jun 23 15:47:34 ldap-b kernel: drbd1: Creating new current UUID
Jun 23 15:47:34 ldap-b kernel: drbd1: Writing meta data super block now.
Jun 23 15:47:34 ldap-b kernel: drbd1: tl_clear()
Jun 23 15:47:34 ldap-b kernel: drbd1: Connection closed
Jun 23 15:47:34 ldap-b kernel: drbd1: helper command: /sbin/drbdadm 
outdate-peer
Jun 23 15:47:34 ldap-b kernel: drbd0: PingAck did not arrive in time.
Jun 23 15:47:34 ldap-b kernel: drbd0: peer( Primary -> Unknown ) conn( 
Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Jun 23 15:47:34 ldap-b kernel: drbd0: asender terminated
Jun 23 15:47:34 ldap-b kernel: drbd0: Terminating asender thread
Jun 23 15:47:34 ldap-b kernel: drbd0: short read expecting header on 
sock: r=-512
Jun 23 15:47:34 ldap-b kernel: drbd0: Writing meta data super block now.
Jun 23 15:47:34 ldap-b kernel: drbd0: tl_clear()
Jun 23 15:47:34 ldap-b kernel: drbd0: Connection closed
Jun 23 15:47:34 ldap-b kernel: drbd0: conn( NetworkFailure -> Unconnected )
Jun 23 15:47:34 ldap-b kernel: drbd0: receiver terminated
Jun 23 15:47:34 ldap-b kernel: drbd0: receiver (re)started
Jun 23 15:47:35 ldap-b kernel: drbd0: helper command: /sbin/drbdadm 
outdate-peer
Jun 23 15:47:35 ldap-b kernel: drbd1: outdate-peer helper returned 5 
(peer is unreachable, assumed to be dead)
Jun 23 15:47:35 ldap-b kernel: drbd1: pdsk( DUnknown -> Outdated )
Jun 23 15:47:35 ldap-b kernel: drbd1: Writing meta data super block now.
Jun 23 15:47:35 ldap-b kernel: drbd1: conn( NetworkFailure -> Unconnected )
Jun 23 15:47:35 ldap-b kernel: drbd1: receiver terminated
Jun 23 15:47:35 ldap-b kernel: drbd1: receiver (re)started
Jun 23 15:47:35 ldap-b kernel: drbd1: conn( Unconnected -> WFConnection )
Jun 23 15:47:35 ldap-b kernel: drbd0: outdate-peer helper returned 5 
(peer is unreachable, assumed to be dead)
Jun 23 15:47:35 ldap-b kernel: drbd0: Creating new current UUID
Jun 23 15:47:35 ldap-b kernel: drbd0: Writing meta data super block now.
Jun 23 15:47:35 ldap-b kernel: drbd0: conn( Unconnected -> WFConnection )
Jun 23 15:47:36 ldap-b kernel: drbd0: helper command: /sbin/drbdadm 
outdate-peer
Jun 23 15:47:36 ldap-b kernel: drbd0: outdate-peer helper returned 5 
(peer is unreachable, assumed to be dead)
Jun 23 15:47:36 ldap-b kernel: drbd0: role( Secondary -> Primary ) pdsk( 
DUnknown -> Outdated )
Jun 23 15:47:36 ldap-b kernel: drbd0: Writing meta data super block now.
Jun 23 15:47:37 ldap-b kernel: kjournald starting.  Commit interval 5 
seconds
Jun 23 15:47:37 ldap-b kernel: EXT3 FS on drbd0, internal journal
Jun 23 15:47:37 ldap-b kernel: EXT3-fs: recovery complete.
Jun 23 15:47:37 ldap-b kernel: EXT3-fs: mounted filesystem with ordered 
data mode.
Jun 23 16:04:25 ldap-b kernel: bnx2: eth1 NIC Link is Down
Jun 23 16:04:26 ldap-b kernel: bnx2: eth1 NIC Link is Up, 100 Mbps full 
duplex, receive & transmit flow control ON
Jun 23 16:04:29 ldap-b kernel: bnx2: eth1 NIC Link is Down
Jun 23 16:04:32 ldap-b kernel: bnx2: eth1 NIC Link is Up, 1000 Mbps full 
duplex, receive & transmit flow control ON
Jun 23 16:05:44 ldap-b kernel: bnx2: eth1 NIC Link is Down
Jun 23 16:05:47 ldap-b kernel: bnx2: eth1 NIC Link is Up, 1000 Mbps full 
duplex, receive & transmit flow control ON
Jun 23 16:05:56 ldap-b kernel: drbd0: Handshake successful: Agreed 
network protocol version 88
Jun 23 16:05:56 ldap-b kernel: drbd0: Peer authenticated using 20 bytes 
of 'sha1' HMAC
Jun 23 16:05:56 ldap-b kernel: drbd0: conn( WFConnection -> WFReportParams )
Jun 23 16:05:56 ldap-b kernel: drbd0: Starting asender thread (from 
drbd0_receiver [16396])
Jun 23 16:05:56 ldap-b kernel: drbd1: Handshake successful: Agreed 
network protocol version 88
Jun 23 16:05:56 ldap-b kernel: drbd1: Peer authenticated using 20 bytes 
of 'sha1' HMAC
Jun 23 16:05:56 ldap-b kernel: drbd1: conn( WFConnection -> WFReportParams )
Jun 23 16:05:56 ldap-b kernel: drbd1: Starting asender thread (from 
drbd1_receiver [16397])
Jun 23 16:05:56 ldap-b kernel: drbd0: data-integrity-alg: <not-used>
Jun 23 16:05:56 ldap-b kernel: drbd0: peer( Unknown -> Secondary ) conn( 
WFReportParams -> WFBitMapS )
Jun 23 16:05:56 ldap-b kernel: drbd0: Writing meta data super block now.
Jun 23 16:05:56 ldap-b kernel: drbd1: data-integrity-alg: <not-used>
Jun 23 16:05:56 ldap-b kernel: drbd1: peer( Unknown -> Secondary ) conn( 
WFReportParams -> WFBitMapS )
Jun 23 16:05:56 ldap-b kernel: drbd1: Writing meta data super block now.
Jun 23 16:05:56 ldap-b kernel: drbd0: conn( WFBitMapS -> SyncSource ) 
pdsk( Outdated -> Inconsistent )
Jun 23 16:05:56 ldap-b kernel: drbd1: aftr_isp( 0 -> 1 )
Jun 23 16:05:56 ldap-b kernel: drbd0: Began resync as SyncSource (will 
sync 520196 KB [130049 bits set]).
Jun 23 16:05:56 ldap-b kernel: drbd0: Writing meta data super block now.
Jun 23 16:05:56 ldap-b kernel: drbd1: peer_isp( 0 -> 1 )
Jun 23 16:05:56 ldap-b kernel: drbd1: conn( WFBitMapS -> PausedSyncS ) 
pdsk( Outdated -> Inconsistent )
Jun 23 16:05:56 ldap-b kernel: drbd1: Began resync as PausedSyncS (will 
sync 68 KB [17 bits set]).
Jun 23 16:05:56 ldap-b kernel: drbd1: Writing meta data super block now.
Jun 23 16:06:05 ldap-b kernel: drbd0: role( Primary -> Secondary )
Jun 23 16:06:05 ldap-b kernel: drbd0: Writing meta data super block now.
Jun 23 16:06:05 ldap-b kernel: drbd0: peer( Secondary -> Primary )
Jun 23 16:06:12 ldap-b kernel: drbd0: Resync done (total 15 sec; paused 
0 sec; 34676 K/sec)
Jun 23 16:06:12 ldap-b kernel: drbd0: conn( SyncSource -> Connected ) 
pdsk( Inconsistent -> UpToDate )
Jun 23 16:06:12 ldap-b kernel: drbd1: aftr_isp( 1 -> 0 )
Jun 23 16:06:12 ldap-b kernel: drbd0: Writing meta data super block now.
Jun 23 16:06:12 ldap-b kernel: drbd1: conn( PausedSyncS -> SyncSource ) 
peer_isp( 1 -> 0 )
Jun 23 16:06:12 ldap-b kernel: drbd1: Syncer continues.
Jun 23 16:06:12 ldap-b kernel: drbd1: Resync done (total 16 sec; paused 
15 sec; 68 K/sec)
Jun 23 16:06:12 ldap-b kernel: drbd1: conn( SyncSource -> Connected ) 
pdsk( Inconsistent -> UpToDate )
Jun 23 16:06:12 ldap-b kernel: drbd1: Writing meta data super block now.
-------------------------------------------------------------------------8<-------

Should I worry about these out-of-sync alerts, or are they a kind of 
false-positive ? In that case, isn't there a way to make it more 
accurate ? Should I disable TCP checksum offloading ? Should I enable 
'replication traffic integrity checking' ?

ldap-a has crashed twice in three weeks, and both times it was while 
executing 'drbdadm verify all'. Couldn't that be a bug in DRBD ?


Thanks in advance for any comment or suggestion !
Eric



More information about the drbd-user mailing list