[DRBD-user] drbdadm verify all seems to produce false positives on ext3 and crash the server

Eric Marin eric.marin at utc.fr
Wed Jul 9 15:12:23 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

due to the problems I encountered before on my two-node Debian cluster, I suspected the megaraid_sas module to cause problems (crashes or data 
corruption) since I had this warning in DELL Open Manage Server Administrator and had read about similar crashes on the web :
Firmware version 6.0.2-0002
Driver version 00.00.03.01
Minimal required driver version 00.00.03.13

As a first test (I have not yet tested your other suggestions, but will do), I updated all the firmwares in the servers (HDD, RAID controller, 
BIOS and BMC) to the latest versions, and installed the latest kernel in etch-backports, which contains an updated megaraid_sas driver : 
linux-image-2.6.25-2-686-bigmem   2.6.25-6~bpo40+1

Then, I recreated the DRBD module for the new kernel :
# m-a a-i drbd8
# depmod -a
# modprobe drbd
# reboot

This setup seems to work : services such as openldap and tomcat apparently work, and failover seems OK.

I then proceeded to test the dreaded 'drbdadm verify all' command which had crashed one server several times before.

I got this in /var/log/kern.log :
---------8<------------------------------------------------------------------------
Jul  9 09:33:09 ldap-b kernel: [   31.950411] drbd: initialised. Version: 8.2.6 (api:88/proto:86-88)
Jul  9 09:33:09 ldap-b kernel: [   31.950501] drbd: GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by root at ldap-b, 2008-07-09 09:18:43
Jul  9 09:33:09 ldap-b kernel: [   31.950596] drbd: registered as block device major 147
Jul  9 09:33:09 ldap-b kernel: [   31.950666] drbd: minor_table @ 0xf64eb3c0
Jul  9 09:33:09 ldap-b kernel: [   31.955913] drbd0: disk( Diskless -> Attaching )
Jul  9 09:33:09 ldap-b kernel: [   31.955985] drbd0: Starting worker thread (from cqueue [3150])
Jul  9 09:33:09 ldap-b kernel: klogd 1.4.1, ---------- state change ----------
Jul  9 09:33:09 ldap-b kernel: [   13.387958] drbd0: Found 4 transactions (192 active extents) in activity log.
Jul  9 09:33:09 ldap-b kernel: [   13.388035] drbd0: max_segment_size ( = BIO size ) = 32768
Jul  9 09:33:09 ldap-b kernel: [   13.388107] drbd0: drbd_bm_resize called with capacity == 19534977
Jul  9 09:33:09 ldap-b kernel: [   13.388276] drbd0: resync bitmap: bits=2441873 words=76310
Jul  9 09:33:09 ldap-b kernel: [   13.388347] drbd0: size = 9539 MB (9767488 KB)
Jul  9 09:33:09 ldap-b kernel: [   31.965631] drbd0: reading of bitmap took 3 jiffies
Jul  9 09:33:09 ldap-b kernel: [   31.966164] drbd0: recounting of set bits took additional 0 jiffies
Jul  9 09:33:09 ldap-b kernel: [   31.966237] drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
Jul  9 09:33:09 ldap-b kernel: [   31.966313] drbd0: disk( Attaching -> Outdated )
Jul  9 09:33:09 ldap-b kernel: [   31.966385] drbd0: Writing meta data super block now.
Jul  9 09:33:09 ldap-b kernel: [   31.967545] drbd1: disk( Diskless -> Attaching )
Jul  9 09:33:09 ldap-b kernel: [   31.967616] drbd1: Starting worker thread (from cqueue [3150])
Jul  9 09:33:09 ldap-b kernel: [   21.160215] drbd1: Found 4 transactions (97 active extents) in activity log.
Jul  9 09:33:09 ldap-b kernel: [   21.160293] drbd1: max_segment_size ( = BIO size ) = 32768
Jul  9 09:33:09 ldap-b kernel: [   21.160367] drbd1: drbd_bm_resize called with capacity == 19534977
Jul  9 09:33:09 ldap-b kernel: [   21.160529] drbd1: resync bitmap: bits=2441873 words=76310
Jul  9 09:33:09 ldap-b kernel: [   21.160602] drbd1: size = 9539 MB (9767488 KB)
Jul  9 09:33:09 ldap-b kernel: [   24.637560] drbd1: reading of bitmap took 1 jiffies
Jul  9 09:33:09 ldap-b kernel: [   24.637560] drbd1: recounting of set bits took additional 0 jiffies
Jul  9 09:33:09 ldap-b kernel: [   24.637560] drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
Jul  9 09:33:09 ldap-b kernel: [   24.637560] drbd1: disk( Attaching -> Outdated )
Jul  9 09:33:09 ldap-b kernel: [   24.637560] drbd1: Writing meta data super block now.
Jul  9 09:33:09 ldap-b kernel: [   31.990641] padlock: VIA PadLock Hash Engine not detected.
Jul  9 09:33:09 ldap-b kernel: [   32.021132] drbd0: conn( StandAlone -> Unconnected )
Jul  9 09:33:09 ldap-b kernel: [   13.492570] drbd0: Starting receiver thread (from drbd0_worker [3162])
Jul  9 09:33:09 ldap-b kernel: [   13.492658] drbd0: receiver (re)started
Jul  9 09:33:09 ldap-b kernel: [   13.492727] drbd0: conn( Unconnected -> WFConnection )
Jul  9 09:33:09 ldap-b kernel: [   32.022195] drbd1: conn( StandAlone -> Unconnected )
Jul  9 09:33:09 ldap-b kernel: [   13.493629] drbd1: Starting receiver thread (from drbd1_worker [3166])
Jul  9 09:33:09 ldap-b kernel: [   13.493715] drbd1: receiver (re)started
Jul  9 09:33:09 ldap-b kernel: [   13.493792] drbd1: conn( Unconnected -> WFConnection )
Jul  9 09:33:10 ldap-b kernel: [   13.589830] drbd0: Handshake successful: Agreed network protocol version 88
Jul  9 09:33:10 ldap-b kernel: [   13.595809] drbd1: Handshake successful: Agreed network protocol version 88
Jul  9 09:33:10 ldap-b kernel: [   13.630653] drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
Jul  9 09:33:10 ldap-b kernel: [   13.630728] drbd0: conn( WFConnection -> WFReportParams )
Jul  9 09:33:10 ldap-b kernel: [   13.630815] drbd0: Starting asender thread (from drbd0_receiver [3192])
Jul  9 09:33:10 ldap-b kernel: [   13.630923] drbd0: data-integrity-alg: <not-used>
Jul  9 09:33:10 ldap-b kernel: [   13.631462] drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Jul  9 09:33:10 ldap-b kernel: [   13.631560] drbd0: Writing meta data super block now.
Jul  9 09:33:10 ldap-b kernel: [   13.632982] drbd1: Peer authenticated using 20 bytes of 'sha1' HMAC
Jul  9 09:33:10 ldap-b kernel: [   13.633057] drbd1: conn( WFConnection -> WFReportParams )
Jul  9 09:33:10 ldap-b kernel: [   13.633138] drbd1: Starting asender thread (from drbd1_receiver [3194])
Jul  9 09:33:10 ldap-b kernel: [   13.633240] drbd1: data-integrity-alg: <not-used>
Jul  9 09:33:10 ldap-b kernel: [   13.633779] drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Jul  9 09:33:10 ldap-b kernel: [   13.633874] drbd1: Writing meta data super block now.
Jul  9 09:33:10 ldap-b kernel: [   13.683724] drbd0: conn( WFBitMapT -> WFSyncUUID )
Jul  9 09:33:10 ldap-b kernel: [   13.685116] drbd1: conn( WFBitMapT -> WFSyncUUID )
Jul  9 09:33:10 ldap-b kernel: [   13.686308] drbd0: helper command: /sbin/drbdadm before-resync-target
Jul  9 09:33:10 ldap-b kernel: [   13.686415] drbd1: peer_isp( 0 -> 1 )
Jul  9 09:33:10 ldap-b kernel: [   13.687562] drbd1: helper command: /sbin/drbdadm before-resync-target
Jul  9 09:33:10 ldap-b kernel: [   13.687644] drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
Jul  9 09:33:10 ldap-b kernel: [   13.687738] drbd1: aftr_isp( 0 -> 1 )
Jul  9 09:33:10 ldap-b kernel: [   13.687810] drbd0: Began resync as SyncTarget (will sync 151620 KB [37905 bits set]).
Jul  9 09:33:10 ldap-b kernel: [   13.687902] drbd0: Writing meta data super block now.
Jul  9 09:33:10 ldap-b kernel: [   13.687996] drbd1: Writing meta data super block now.
Jul  9 09:33:10 ldap-b kernel: [   13.688601] drbd1: conn( WFSyncUUID -> PausedSyncT ) disk( Outdated -> Inconsistent )
Jul  9 09:33:10 ldap-b kernel: [   13.688696] drbd1: Began resync as PausedSyncT (will sync 0 KB [0 bits set]).
Jul  9 09:33:10 ldap-b kernel: [   13.688774] drbd1: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
Jul  9 09:33:10 ldap-b kernel: [   13.688849] drbd1: conn( PausedSyncT -> Connected ) disk( Inconsistent -> UpToDate )
Jul  9 09:33:10 ldap-b kernel: [   13.689335] drbd1: helper command: /sbin/drbdadm after-resync-target
Jul  9 09:33:10 ldap-b kernel: [   13.689415] drbd1: Writing meta data super block now.
Jul  9 09:33:14 ldap-b kernel: [   15.761543] drbd0: Resync done (total 4 sec; paused 0 sec; 37904 K/sec)
Jul  9 09:33:14 ldap-b kernel: [   15.761642] drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
Jul  9 09:33:14 ldap-b kernel: [   15.762198] drbd0: helper command: /sbin/drbdadm after-resync-target
Jul  9 09:33:14 ldap-b kernel: [   15.763409] drbd1: aftr_isp( 1 -> 0 )
Jul  9 09:33:14 ldap-b kernel: [   15.763480] drbd0: Writing meta data super block now.
Jul  9 09:33:14 ldap-b kernel: [   15.767950] drbd1: peer_isp( 1 -> 0 )
Jul  9 09:33:16 ldap-b kernel: [   15.937945] drbd1: peer( Primary -> Secondary )
Jul  9 09:33:17 ldap-b kernel: [   38.436159] drbd1: role( Secondary -> Primary )
Jul  9 09:33:17 ldap-b kernel: [   16.027376] drbd1: Writing meta data super block now.
Jul  9 09:33:17 ldap-b kernel: [   24.341545] kjournald starting.  Commit interval 5 seconds
Jul  9 09:33:17 ldap-b kernel: [   16.229763] EXT3 FS on drbd1, internal journal
Jul  9 09:33:17 ldap-b kernel: [   16.229887] EXT3-fs: mounted filesystem with journal data mode.
Jul  9 09:48:46 ldap-b kernel: [  579.513151] drbd0: conn( Connected -> VerifyT )
Jul  9 09:48:46 ldap-b kernel: [  579.513151] drbd1: conn( Connected -> VerifyT )
Jul  9 09:48:46 ldap-b kernel: [  579.881174] drbd0: Out of sync: start=33408, size=8 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  579.881174] drbd0: Out of sync: start=33384, size=24 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  579.881174] drbd0: Out of sync: start=33496, size=8 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  579.881174] drbd0: Out of sync: start=33512, size=16 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  579.932420] drbd0: Out of sync: start=34216, size=8 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  579.932522] drbd0: Out of sync: start=34320, size=8 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  579.932602] drbd0: Out of sync: start=34344, size=8 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  579.932692] drbd0: Out of sync: start=34408, size=16 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  579.932784] drbd0: Out of sync: start=34504, size=8 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  579.932875] drbd0: Out of sync: start=34592, size=8 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  579.932954] drbd0: Out of sync: start=34616, size=8 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  579.933032] drbd0: Out of sync: start=34632, size=8 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  579.933156] drbd0: Out of sync: start=34840, size=32 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  580.004526] drbd0: Out of sync: start=42448, size=32 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  580.004658] drbd0: Out of sync: start=42680, size=16 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  580.004736] drbd0: Out of sync: start=42704, size=8 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  580.004837] drbd0: Out of sync: start=42824, size=8 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  580.004936] drbd0: Out of sync: start=42936, size=16 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  580.005022] drbd0: Out of sync: start=43008, size=8 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  580.005101] drbd0: Out of sync: start=43024, size=16 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  580.005213] drbd0: Out of sync: start=43144, size=8 (sectors)
Jul  9 09:48:46 ldap-b kernel: [  580.021183] drbd0: Out of sync: start=42120, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.149145] drbd0: Out of sync: start=54928, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.149208] drbd0: Out of sync: start=54992, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.149312] drbd0: Out of sync: start=55104, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.393207] drbd0: Out of sync: start=75688, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.393207] drbd0: Out of sync: start=75728, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.393207] drbd0: Out of sync: start=75784, size=16 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.393207] drbd0: Out of sync: start=75824, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.393207] drbd0: Out of sync: start=75856, size=24 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.393207] drbd0: Out of sync: start=75936, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.393207] drbd0: Out of sync: start=75960, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.537216] drbd0: Out of sync: start=93848, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.537216] drbd0: Out of sync: start=93872, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.537216] drbd0: Out of sync: start=93912, size=16 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.537961] drbd0: Out of sync: start=93928, size=40 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.541216] drbd0: Out of sync: start=93968, size=168 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.541216] drbd0: Out of sync: start=94152, size=24 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.542939] drbd0: Out of sync: start=94448, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.543017] drbd0: Out of sync: start=94472, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.597245] drbd0: Out of sync: start=94536, size=16 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.597330] drbd0: Out of sync: start=94560, size=24 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.597436] drbd0: Out of sync: start=94704, size=16 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.597522] drbd0: Out of sync: start=94776, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.597601] drbd0: Out of sync: start=94792, size=16 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.597703] drbd0: Out of sync: start=94936, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.757015] drbd0: Out of sync: start=108968, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.757105] drbd0: Out of sync: start=108992, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.757194] drbd0: Out of sync: start=109016, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.757236] drbd0: Out of sync: start=109024, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.757312] drbd0: Out of sync: start=109032, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.757389] drbd0: Out of sync: start=109040, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.757468] drbd0: Out of sync: start=109048, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.757544] drbd0: Out of sync: start=109056, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.757621] drbd0: Out of sync: start=109064, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.757697] drbd0: Out of sync: start=109072, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.757779] drbd0: Out of sync: start=109080, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.757854] drbd0: Out of sync: start=109088, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.757933] drbd0: Out of sync: start=109096, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.758008] drbd0: Out of sync: start=109104, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.758085] drbd0: Out of sync: start=109112, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.758162] drbd0: Out of sync: start=109120, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.758265] drbd0: Out of sync: start=109128, size=112 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.758347] drbd0: Out of sync: start=109256, size=24 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.758465] drbd0: Out of sync: start=109472, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.758547] drbd0: Out of sync: start=109504, size=16 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.758626] drbd0: Out of sync: start=109536, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.758706] drbd0: Out of sync: start=109560, size=16 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.758787] drbd0: Out of sync: start=109584, size=24 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.758875] drbd0: Out of sync: start=109664, size=16 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.758962] drbd0: Out of sync: start=109736, size=8 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.759041] drbd0: Out of sync: start=109752, size=16 (sectors)
Jul  9 09:48:47 ldap-b kernel: [  580.766251] drbd0: Out of sync: start=109880, size=8 (sectors)
Jul  9 09:48:48 ldap-b kernel: [  580.985244] drbd0: Out of sync: start=134728, size=16 (sectors)
Jul  9 09:48:48 ldap-b kernel: [  581.047252] drbd0: Out of sync: start=135104, size=8 (sectors)
Jul  9 09:48:48 ldap-b kernel: [  581.047361] drbd0: Out of sync: start=135224, size=16 (sectors)
Jul  9 09:48:48 ldap-b kernel: [  581.047440] drbd0: Out of sync: start=135256, size=8 (sectors)
Jul  9 09:48:48 ldap-b kernel: [  581.047525] drbd0: Out of sync: start=135288, size=16 (sectors)
Jul  9 09:48:48 ldap-b kernel: [  581.047601] drbd0: Out of sync: start=135312, size=8 (sectors)
Jul  9 09:48:48 ldap-b kernel: [  581.047697] drbd0: Out of sync: start=135416, size=8 (sectors)
Jul  9 09:48:48 ldap-b kernel: [  581.047778] drbd0: Out of sync: start=135456, size=8 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  581.712960] drbd0: Out of sync: start=195680, size=24 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  581.713053] drbd0: Out of sync: start=195744, size=24 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  581.713290] drbd0: Out of sync: start=198032, size=16 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  581.713290] drbd0: Out of sync: start=198160, size=8 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  581.715658] drbd0: Out of sync: start=195304, size=16 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  581.717572] drbd0: Out of sync: start=196048, size=8 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  581.729243] drbd0: Out of sync: start=196160, size=16 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  581.729312] drbd0: Out of sync: start=196232, size=8 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  581.729393] drbd0: Out of sync: start=196248, size=16 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  581.729492] drbd0: Out of sync: start=196376, size=8 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  581.729993] drbd0: Out of sync: start=195320, size=8 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  582.021309] drbd0: Out of sync: start=226864, size=8 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  582.021309] drbd0: Out of sync: start=226904, size=8 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  582.021309] drbd0: Out of sync: start=226920, size=16 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  582.021309] drbd0: Out of sync: start=226944, size=8 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  582.021309] drbd0: Out of sync: start=227040, size=8 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  582.021309] drbd0: Out of sync: start=227072, size=8 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  582.161087] drbd0: Out of sync: start=239624, size=8 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  582.161179] drbd0: Out of sync: start=239688, size=8 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  582.161260] drbd0: Out of sync: start=239704, size=8 (sectors)
Jul  9 09:48:49 ldap-b kernel: [  582.161472] drbd0: Out of sync: start=239808, size=8 (sectors)
Jul  9 09:48:50 ldap-b kernel: [  582.623923] drbd0: Out of sync: start=279368, size=8 (sectors)
Jul  9 09:48:50 ldap-b kernel: [  582.624079] drbd0: Out of sync: start=278968, size=8 (sectors)
Jul  9 09:48:50 ldap-b kernel: [  582.624159] drbd0: Out of sync: start=278992, size=8 (sectors)
Jul  9 09:48:50 ldap-b kernel: [  582.624246] drbd0: Out of sync: start=279032, size=24 (sectors)
Jul  9 09:48:50 ldap-b kernel: [  582.624325] drbd0: Out of sync: start=279064, size=16 (sectors)
Jul  9 09:48:50 ldap-b kernel: [  582.624409] drbd0: Out of sync: start=279112, size=16 (sectors)
Jul  9 09:48:50 ldap-b kernel: [  582.624492] drbd0: Out of sync: start=279152, size=24 (sectors)
Jul  9 09:48:50 ldap-b kernel: [  582.624592] drbd0: Out of sync: start=279272, size=24 (sectors)
Jul  9 09:48:50 ldap-b kernel: [  582.624681] drbd0: Out of sync: start=279312, size=56 (sectors)
Jul  9 09:51:01 ldap-b kernel: [  672.823561] drbd0: Out of sync: start=8222672, size=8 (sectors)
Jul  9 09:51:02 ldap-b kernel: [  673.173588] drbd0: Out of sync: start=8253296, size=8 (sectors)
Jul  9 09:51:02 ldap-b kernel: [  673.252864] drbd0: Out of sync: start=8255712, size=32 (sectors)
Jul  9 09:51:02 ldap-b kernel: [  673.268004] drbd0: Out of sync: start=8260544, size=16 (sectors)
Jul  9 09:51:02 ldap-b kernel: [  673.268089] drbd0: Out of sync: start=8260584, size=16 (sectors)
Jul  9 09:51:02 ldap-b kernel: [  673.268166] drbd0: Out of sync: start=8260608, size=8 (sectors)
Jul  9 09:51:02 ldap-b kernel: [  673.296147] drbd0: Out of sync: start=8266632, size=88 (sectors)
Jul  9 09:51:02 ldap-b kernel: [  673.299041] drbd0: Out of sync: start=8266720, size=16 (sectors)
Jul  9 09:51:02 ldap-b kernel: [  673.473608] drbd0: Out of sync: start=8280320, size=8 (sectors)
Jul  9 09:54:05 ldap-b kernel: [  800.670741] drbd1: Online verify  done (total 319 sec; paused 0 sec; 30616 K/sec)
Jul  9 09:54:05 ldap-b kernel: [  800.670841] drbd1: conn( VerifyT -> Connected )
Jul  9 09:54:06 ldap-b kernel: [  801.216397] drbd0: Online verify  done (total 320 sec; paused 0 sec; 30520 K/sec)
Jul  9 09:54:06 ldap-b kernel: [  801.216490] drbd0: Online verify found 105 4k block out of sync!
Jul  9 09:54:06 ldap-b kernel: [  801.216567] drbd0: helper command: /sbin/drbdadm out-of-sync
Jul  9 09:54:06 ldap-b kernel: [  801.224863] drbd0: conn( VerifyT -> Connected )
Jul  9 09:54:06 ldap-b kernel: [  801.224939] drbd0: Writing the whole bitmap, due to failed kmalloc
Jul  9 09:54:06 ldap-b kernel: [  801.226171] drbd0: writing of bitmap took 0 jiffies
Jul  9 09:54:06 ldap-b kernel: [  801.226244] drbd0: 420 KB (105 bits) marked out-of-sync by on disk bit-map.
-------------------------------------------------------->8-------------------------

That's an awful lot of blocks !! Besides, I see no reason why it should be out-of-sync at all...

I put the services on ldap-a and disconnected and reconnected DRBD on ldap-b to allow resynchronization :
# drbdadm disconnect all
# drbdadm connect all

When synchronization completed, I ran 'drbdadm verify all' again :
---------8<------------------------------------------------------------------------
Jul  9 09:54:36 ldap-b kernel: [  811.325532] drbd1: role( Primary -> Secondary )
Jul  9 09:54:36 ldap-b kernel: [  816.458995] drbd1: Writing meta data super block now.
Jul  9 09:54:37 ldap-b kernel: [  816.492114] drbd1: peer( Secondary -> Primary )
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd0: Requested state change failed by peer: Refusing to be Primary while peer is not outdated
Jul  9 09:55:07 ldap-b kernel: [  841.194640] drbd0: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) 
pdsk( UpToDate -> DUnknown )
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd0: short read expecting header on sock: r=-512
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd0: asender terminated
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd0: Terminating asender thread
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd0: Writing meta data super block now.
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd0: tl_clear()
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd0: Connection closed
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd0: conn( Disconnecting -> StandAlone )
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd0: receiver terminated
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd0: Terminating receiver thread
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd1: Requested state change failed by peer: Refusing to be Primary while peer is not outdated
Jul  9 09:55:07 ldap-b kernel: [  841.196756] drbd1: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) 
pdsk( UpToDate -> DUnknown )
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd1: Writing meta data super block now.
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd1: short read expecting header on sock: r=-512
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd1: asender terminated
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd1: Terminating asender thread
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd1: tl_clear()
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd1: Connection closed
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd1: conn( Disconnecting -> StandAlone )
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd1: receiver terminated
Jul  9 09:55:07 ldap-b kernel: [  819.060282] drbd1: Terminating receiver thread
Jul  9 09:55:11 ldap-b kernel: [  845.296145] drbd0: conn( StandAlone -> Unconnected )
Jul  9 09:55:11 ldap-b kernel: [  819.319669] drbd0: Starting receiver thread (from drbd0_worker [3162])
Jul  9 09:55:11 ldap-b kernel: [  819.319768] drbd0: receiver (re)started
Jul  9 09:55:11 ldap-b kernel: [  819.319839] drbd0: conn( Unconnected -> WFConnection )
Jul  9 09:55:11 ldap-b kernel: [  845.296865] drbd1: conn( StandAlone -> Unconnected )
Jul  9 09:55:11 ldap-b kernel: [  819.320387] drbd1: Starting receiver thread (from drbd1_worker [3166])
Jul  9 09:55:11 ldap-b kernel: [  819.320475] drbd1: receiver (re)started
Jul  9 09:55:11 ldap-b kernel: [  819.320545] drbd1: conn( Unconnected -> WFConnection )
Jul  9 09:55:11 ldap-b kernel: [  819.420230] drbd1: Handshake successful: Agreed network protocol version 88
Jul  9 09:55:11 ldap-b kernel: [  819.420395] drbd0: Handshake successful: Agreed network protocol version 88
Jul  9 09:55:11 ldap-b kernel: [  819.428414] drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
Jul  9 09:55:11 ldap-b kernel: [  819.428488] drbd0: conn( WFConnection -> WFReportParams )
Jul  9 09:55:11 ldap-b kernel: [  819.428570] drbd0: Starting asender thread (from drbd0_receiver [5552])
Jul  9 09:55:11 ldap-b kernel: [  819.428666] drbd1: Peer authenticated using 20 bytes of 'sha1' HMAC
Jul  9 09:55:11 ldap-b kernel: [  819.428740] drbd1: conn( WFConnection -> WFReportParams )
Jul  9 09:55:11 ldap-b kernel: [  819.428822] drbd1: Starting asender thread (from drbd1_receiver [5554])
Jul  9 09:55:11 ldap-b kernel: [  819.430675] drbd0: data-integrity-alg: <not-used>
Jul  9 09:55:11 ldap-b kernel: [  819.431215] drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Jul  9 09:55:11 ldap-b kernel: [  819.431311] drbd0: Writing meta data super block now.
Jul  9 09:55:11 ldap-b kernel: [  819.431419] drbd1: data-integrity-alg: <not-used>
Jul  9 09:55:11 ldap-b kernel: [  819.431509] drbd1: Writing the whole bitmap, full sync required after drbd_sync_handshake.
Jul  9 09:55:11 ldap-b kernel: [  819.431603] drbd1: Writing meta data super block now.
Jul  9 09:55:11 ldap-b kernel: [  819.432441] drbd1: writing of bitmap took 0 jiffies
Jul  9 09:55:11 ldap-b kernel: [  819.432513] drbd1: 9539 MB (2441873 bits) marked out-of-sync by on disk bit-map.
Jul  9 09:55:11 ldap-b kernel: [  819.432602] drbd1: Writing meta data super block now.
Jul  9 09:55:11 ldap-b kernel: [  819.433165] drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Jul  9 09:55:11 ldap-b kernel: [  819.433261] drbd1: Writing meta data super block now.
Jul  9 09:55:11 ldap-b kernel: [  819.438782] drbd1: conn( WFBitMapT -> WFSyncUUID )
Jul  9 09:55:11 ldap-b kernel: [  819.440311] drbd1: helper command: /sbin/drbdadm before-resync-target
Jul  9 09:55:11 ldap-b kernel: [  819.441431] drbd1: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
Jul  9 09:55:11 ldap-b kernel: [  819.441524] drbd1: Began resync as SyncTarget (will sync 9767492 KB [2441873 bits set]).
Jul  9 09:55:11 ldap-b kernel: [  819.441617] drbd1: Writing meta data super block now.
Jul  9 09:55:11 ldap-b kernel: [  819.465453] drbd0: conn( WFBitMapT -> WFSyncUUID )
Jul  9 09:55:11 ldap-b kernel: [  819.467149] drbd0: helper command: /sbin/drbdadm before-resync-target
Jul  9 09:55:11 ldap-b kernel: [  819.468233] drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
Jul  9 09:55:11 ldap-b kernel: [  819.468308] drbd1: conn( SyncTarget -> PausedSyncT ) aftr_isp( 0 -> 1 )
Jul  9 09:55:11 ldap-b kernel: [  819.468308] drbd1: Resync suspended
Jul  9 09:55:11 ldap-b kernel: [  819.468318] drbd0: Began resync as SyncTarget (will sync 2068 KB [517 bits set]).
Jul  9 09:55:11 ldap-b kernel: [  819.468409] drbd0: Writing meta data super block now.
Jul  9 09:55:11 ldap-b kernel: [  819.478913] drbd1: peer_isp( 0 -> 1 )
Jul  9 09:55:11 ldap-b kernel: [  819.516311] drbd0: Resync done (total 1 sec; paused 0 sec; 2068 K/sec)
Jul  9 09:55:11 ldap-b kernel: [  819.516311] drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
Jul  9 09:55:11 ldap-b kernel: [  819.516311] drbd0: helper command: /sbin/drbdadm after-resync-target
Jul  9 09:55:11 ldap-b kernel: [  819.516311] drbd1: peer_isp( 1 -> 0 )
Jul  9 09:55:11 ldap-b kernel: [  819.516311] drbd1: conn( PausedSyncT -> SyncTarget ) aftr_isp( 1 -> 0 )
Jul  9 09:55:11 ldap-b kernel: [  819.516311] drbd1: Syncer continues.
Jul  9 09:55:11 ldap-b kernel: [  819.516312] drbd0: Writing meta data super block now.
Jul  9 09:59:59 ldap-b kernel: [  982.455134] drbd1: Resync done (total 288 sec; paused 0 sec; 33912 K/sec)
Jul  9 09:59:59 ldap-b kernel: [  982.455219] drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
Jul  9 09:59:59 ldap-b kernel: [  982.455776] drbd1: helper command: /sbin/drbdadm after-resync-target
Jul  9 09:59:59 ldap-b kernel: [  982.457052] drbd1: Writing meta data super block now.
Jul  9 12:40:41 ldap-b kernel: [ 4734.309515] drbd0: conn( Connected -> VerifyT )
Jul  9 12:40:41 ldap-b kernel: [ 4734.309515] drbd1: conn( Connected -> VerifyT )
Jul  9 12:40:42 ldap-b kernel: [ 4734.505528] drbd0: Out of sync: start=19880, size=8 (sectors)
Jul  9 12:40:42 ldap-b kernel: [ 4734.555944] drbd0: Out of sync: start=21008, size=24 (sectors)
Jul  9 12:40:42 ldap-b kernel: [ 4734.556010] drbd0: Out of sync: start=21040, size=120 (sectors)
Jul  9 12:40:42 ldap-b kernel: [ 4734.556045] drbd0: Out of sync: start=21168, size=8 (sectors)
Jul  9 12:40:42 ldap-b kernel: [ 4734.556086] drbd0: Out of sync: start=21200, size=32 (sectors)
Jul  9 12:40:42 ldap-b kernel: [ 4734.556123] drbd0: Out of sync: start=21240, size=16 (sectors)
Jul  9 12:40:42 ldap-b kernel: [ 4734.556235] drbd0: Out of sync: start=21608, size=24 (sectors)
Jul  9 12:40:42 ldap-b kernel: [ 4734.556275] drbd0: Out of sync: start=21664, size=16 (sectors)
Jul  9 12:40:42 ldap-b kernel: [ 4734.556308] drbd0: Out of sync: start=21688, size=8 (sectors)
Jul  9 12:40:42 ldap-b kernel: [ 4734.556339] drbd0: Out of sync: start=21704, size=8 (sectors)
Jul  9 12:40:42 ldap-b kernel: [ 4734.556371] drbd0: Out of sync: start=21720, size=8 (sectors)
Jul  9 12:40:42 ldap-b kernel: [ 4734.556426] drbd0: Out of sync: start=21752, size=80 (sectors)
Jul  9 12:40:42 ldap-b kernel: [ 4734.556500] drbd0: Out of sync: start=22016, size=16 (sectors)
Jul  9 12:40:42 ldap-b kernel: [ 4734.556535] drbd0: Out of sync: start=22048, size=8 (sectors)
Jul  9 12:40:42 ldap-b kernel: [ 4734.556574] drbd0: Out of sync: start=22088, size=8 (sectors)
Jul  9 12:40:42 ldap-b kernel: [ 4734.556618] drbd0: Out of sync: start=22152, size=16 (sectors)
Jul  9 12:40:42 ldap-b kernel: [ 4734.816985] drbd0: Out of sync: start=44632, size=8 (sectors)
Jul  9 12:40:42 ldap-b kernel: [ 4735.141568] drbd0: Out of sync: start=73552, size=8 (sectors)
Jul  9 12:40:43 ldap-b kernel: [ 4735.373582] drbd0: Out of sync: start=94120, size=8 (sectors)
Jul  9 12:40:43 ldap-b kernel: [ 4735.373582] drbd0: Out of sync: start=94136, size=8 (sectors)
Jul  9 12:40:43 ldap-b kernel: [ 4735.539493] drbd0: Out of sync: start=112944, size=16 (sectors)
Jul  9 12:40:43 ldap-b kernel: [ 4735.729605] drbd0: Out of sync: start=123856, size=8 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4735.948401] drbd0: Out of sync: start=141176, size=8 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4735.953654] drbd0: Out of sync: start=142448, size=8 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4735.955387] drbd0: Out of sync: start=142752, size=32 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4735.957198] drbd0: Out of sync: start=143016, size=8 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4735.957234] drbd0: Out of sync: start=143040, size=8 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4735.957268] drbd0: Out of sync: start=143056, size=8 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4735.957320] drbd0: Out of sync: start=143160, size=8 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4735.957619] drbd0: Out of sync: start=143304, size=16 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4736.193199] drbd0: Out of sync: start=167040, size=224 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4736.193243] drbd0: Out of sync: start=167288, size=24 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4736.193422] drbd0: Out of sync: start=166968, size=8 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4736.193459] drbd0: Out of sync: start=167000, size=8 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4736.197704] drbd0: Out of sync: start=167632, size=8 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4736.197745] drbd0: Out of sync: start=167664, size=16 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4736.197783] drbd0: Out of sync: start=167704, size=8 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4736.197821] drbd0: Out of sync: start=167736, size=16 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4736.197859] drbd0: Out of sync: start=167760, size=24 (sectors)
Jul  9 12:40:44 ldap-b kernel: [ 4736.197917] drbd0: Out of sync: start=167904, size=16 (sectors)
Jul  9 12:40:45 ldap-b kernel: [ 4737.192747] drbd0: Out of sync: start=250304, size=8 (sectors)
Jul  9 12:40:45 ldap-b kernel: [ 4737.277286] drbd0: Out of sync: start=261992, size=8 (sectors)
Jul  9 12:40:46 ldap-b kernel: [ 4737.357708] drbd0: Out of sync: start=271184, size=8 (sectors)
Jul  9 12:40:46 ldap-b kernel: [ 4737.357708] drbd0: Out of sync: start=271496, size=32 (sectors)
Jul  9 12:40:46 ldap-b kernel: [ 4737.357708] drbd0: Out of sync: start=271640, size=8 (sectors)
Jul  9 12:40:46 ldap-b kernel: [ 4737.357708] drbd0: Out of sync: start=271656, size=16 (sectors)
Jul  9 12:40:46 ldap-b kernel: [ 4737.357708] drbd0: Out of sync: start=271792, size=8 (sectors)
Jul  9 12:40:46 ldap-b kernel: [ 4737.357708] drbd0: Out of sync: start=271904, size=16 (sectors)
Jul  9 12:42:50 ldap-b kernel: [ 4823.344894] drbd0: Out of sync: start=7943192, size=8 (sectors)
Jul  9 12:42:55 ldap-b kernel: [ 4826.871364] drbd0: Out of sync: start=8251864, size=8 (sectors)
Jul  9 12:42:55 ldap-b kernel: [ 4826.886980] drbd0: Out of sync: start=8251848, size=8 (sectors)
Jul  9 12:45:56 ldap-b kernel: [ 4952.607170] drbd1: Online verify  done (total 314 sec; paused 0 sec; 31104 K/sec)
Jul  9 12:45:56 ldap-b kernel: [ 4952.607224] drbd1: conn( VerifyT -> Connected )
Jul  9 12:45:57 ldap-b kernel: [ 4953.155344] drbd0: Online verify  done (total 315 sec; paused 0 sec; 31004 K/sec)
Jul  9 12:45:57 ldap-b kernel: [ 4953.155344] drbd0: Online verify found 19 4k block out of sync!
Jul  9 12:45:57 ldap-b kernel: [ 4953.155344] drbd0: helper command: /sbin/drbdadm out-of-sync
Jul  9 12:45:57 ldap-b kernel: [ 4953.176074] drbd0: conn( VerifyT -> Connected )
Jul  9 12:45:57 ldap-b kernel: [ 4953.176104] drbd0: Writing the whole bitmap, due to failed kmalloc
Jul  9 12:45:57 ldap-b kernel: [ 4953.177306] drbd0: writing of bitmap took 1 jiffies
Jul  9 12:45:57 ldap-b kernel: [ 4953.177335] drbd0: 76 KB (19 bits) marked out-of-sync by on disk bit-map.
-------------------------------------------------------->8-------------------------

Any idea how that is possible ? (Well, I guess the answer is : the kernel :-\)

Thanks,
Eric

Eric Marin wrote (2008.06.23 16:32) :
 > Hello and sorry about the length of this report,
 >
 > I'm the administrator of a cluster which consists of two nodes : ldap-a
 > and ldap-b.
 >
 > Both nodes are identical and run on :
 > -Debian Etch (up to date)
 > -DRBD 8.2.6
 > -Heartbeat 2.1.3-5~bpo40+1 from etch-backports, patched to fix this bug
 > : http://hg.linux-ha.org/dev/rev/47f60bebe7b2
 > -EXT3 filesystem
 >
(...)



More information about the drbd-user mailing list