Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, due to the problems I encountered before on my two-node Debian cluster, I suspected the megaraid_sas module to cause problems (crashes or data corruption) since I had this warning in DELL Open Manage Server Administrator and had read about similar crashes on the web : Firmware version 6.0.2-0002 Driver version 00.00.03.01 Minimal required driver version 00.00.03.13 As a first test (I have not yet tested your other suggestions, but will do), I updated all the firmwares in the servers (HDD, RAID controller, BIOS and BMC) to the latest versions, and installed the latest kernel in etch-backports, which contains an updated megaraid_sas driver : linux-image-2.6.25-2-686-bigmem 2.6.25-6~bpo40+1 Then, I recreated the DRBD module for the new kernel : # m-a a-i drbd8 # depmod -a # modprobe drbd # reboot This setup seems to work : services such as openldap and tomcat apparently work, and failover seems OK. I then proceeded to test the dreaded 'drbdadm verify all' command which had crashed one server several times before. I got this in /var/log/kern.log : ---------8<------------------------------------------------------------------------ Jul 9 09:33:09 ldap-b kernel: [ 31.950411] drbd: initialised. Version: 8.2.6 (api:88/proto:86-88) Jul 9 09:33:09 ldap-b kernel: [ 31.950501] drbd: GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by root at ldap-b, 2008-07-09 09:18:43 Jul 9 09:33:09 ldap-b kernel: [ 31.950596] drbd: registered as block device major 147 Jul 9 09:33:09 ldap-b kernel: [ 31.950666] drbd: minor_table @ 0xf64eb3c0 Jul 9 09:33:09 ldap-b kernel: [ 31.955913] drbd0: disk( Diskless -> Attaching ) Jul 9 09:33:09 ldap-b kernel: [ 31.955985] drbd0: Starting worker thread (from cqueue [3150]) Jul 9 09:33:09 ldap-b kernel: klogd 1.4.1, ---------- state change ---------- Jul 9 09:33:09 ldap-b kernel: [ 13.387958] drbd0: Found 4 transactions (192 active extents) in activity log. Jul 9 09:33:09 ldap-b kernel: [ 13.388035] drbd0: max_segment_size ( = BIO size ) = 32768 Jul 9 09:33:09 ldap-b kernel: [ 13.388107] drbd0: drbd_bm_resize called with capacity == 19534977 Jul 9 09:33:09 ldap-b kernel: [ 13.388276] drbd0: resync bitmap: bits=2441873 words=76310 Jul 9 09:33:09 ldap-b kernel: [ 13.388347] drbd0: size = 9539 MB (9767488 KB) Jul 9 09:33:09 ldap-b kernel: [ 31.965631] drbd0: reading of bitmap took 3 jiffies Jul 9 09:33:09 ldap-b kernel: [ 31.966164] drbd0: recounting of set bits took additional 0 jiffies Jul 9 09:33:09 ldap-b kernel: [ 31.966237] drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jul 9 09:33:09 ldap-b kernel: [ 31.966313] drbd0: disk( Attaching -> Outdated ) Jul 9 09:33:09 ldap-b kernel: [ 31.966385] drbd0: Writing meta data super block now. Jul 9 09:33:09 ldap-b kernel: [ 31.967545] drbd1: disk( Diskless -> Attaching ) Jul 9 09:33:09 ldap-b kernel: [ 31.967616] drbd1: Starting worker thread (from cqueue [3150]) Jul 9 09:33:09 ldap-b kernel: [ 21.160215] drbd1: Found 4 transactions (97 active extents) in activity log. Jul 9 09:33:09 ldap-b kernel: [ 21.160293] drbd1: max_segment_size ( = BIO size ) = 32768 Jul 9 09:33:09 ldap-b kernel: [ 21.160367] drbd1: drbd_bm_resize called with capacity == 19534977 Jul 9 09:33:09 ldap-b kernel: [ 21.160529] drbd1: resync bitmap: bits=2441873 words=76310 Jul 9 09:33:09 ldap-b kernel: [ 21.160602] drbd1: size = 9539 MB (9767488 KB) Jul 9 09:33:09 ldap-b kernel: [ 24.637560] drbd1: reading of bitmap took 1 jiffies Jul 9 09:33:09 ldap-b kernel: [ 24.637560] drbd1: recounting of set bits took additional 0 jiffies Jul 9 09:33:09 ldap-b kernel: [ 24.637560] drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jul 9 09:33:09 ldap-b kernel: [ 24.637560] drbd1: disk( Attaching -> Outdated ) Jul 9 09:33:09 ldap-b kernel: [ 24.637560] drbd1: Writing meta data super block now. Jul 9 09:33:09 ldap-b kernel: [ 31.990641] padlock: VIA PadLock Hash Engine not detected. Jul 9 09:33:09 ldap-b kernel: [ 32.021132] drbd0: conn( StandAlone -> Unconnected ) Jul 9 09:33:09 ldap-b kernel: [ 13.492570] drbd0: Starting receiver thread (from drbd0_worker [3162]) Jul 9 09:33:09 ldap-b kernel: [ 13.492658] drbd0: receiver (re)started Jul 9 09:33:09 ldap-b kernel: [ 13.492727] drbd0: conn( Unconnected -> WFConnection ) Jul 9 09:33:09 ldap-b kernel: [ 32.022195] drbd1: conn( StandAlone -> Unconnected ) Jul 9 09:33:09 ldap-b kernel: [ 13.493629] drbd1: Starting receiver thread (from drbd1_worker [3166]) Jul 9 09:33:09 ldap-b kernel: [ 13.493715] drbd1: receiver (re)started Jul 9 09:33:09 ldap-b kernel: [ 13.493792] drbd1: conn( Unconnected -> WFConnection ) Jul 9 09:33:10 ldap-b kernel: [ 13.589830] drbd0: Handshake successful: Agreed network protocol version 88 Jul 9 09:33:10 ldap-b kernel: [ 13.595809] drbd1: Handshake successful: Agreed network protocol version 88 Jul 9 09:33:10 ldap-b kernel: [ 13.630653] drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC Jul 9 09:33:10 ldap-b kernel: [ 13.630728] drbd0: conn( WFConnection -> WFReportParams ) Jul 9 09:33:10 ldap-b kernel: [ 13.630815] drbd0: Starting asender thread (from drbd0_receiver [3192]) Jul 9 09:33:10 ldap-b kernel: [ 13.630923] drbd0: data-integrity-alg: <not-used> Jul 9 09:33:10 ldap-b kernel: [ 13.631462] drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Jul 9 09:33:10 ldap-b kernel: [ 13.631560] drbd0: Writing meta data super block now. Jul 9 09:33:10 ldap-b kernel: [ 13.632982] drbd1: Peer authenticated using 20 bytes of 'sha1' HMAC Jul 9 09:33:10 ldap-b kernel: [ 13.633057] drbd1: conn( WFConnection -> WFReportParams ) Jul 9 09:33:10 ldap-b kernel: [ 13.633138] drbd1: Starting asender thread (from drbd1_receiver [3194]) Jul 9 09:33:10 ldap-b kernel: [ 13.633240] drbd1: data-integrity-alg: <not-used> Jul 9 09:33:10 ldap-b kernel: [ 13.633779] drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Jul 9 09:33:10 ldap-b kernel: [ 13.633874] drbd1: Writing meta data super block now. Jul 9 09:33:10 ldap-b kernel: [ 13.683724] drbd0: conn( WFBitMapT -> WFSyncUUID ) Jul 9 09:33:10 ldap-b kernel: [ 13.685116] drbd1: conn( WFBitMapT -> WFSyncUUID ) Jul 9 09:33:10 ldap-b kernel: [ 13.686308] drbd0: helper command: /sbin/drbdadm before-resync-target Jul 9 09:33:10 ldap-b kernel: [ 13.686415] drbd1: peer_isp( 0 -> 1 ) Jul 9 09:33:10 ldap-b kernel: [ 13.687562] drbd1: helper command: /sbin/drbdadm before-resync-target Jul 9 09:33:10 ldap-b kernel: [ 13.687644] drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) Jul 9 09:33:10 ldap-b kernel: [ 13.687738] drbd1: aftr_isp( 0 -> 1 ) Jul 9 09:33:10 ldap-b kernel: [ 13.687810] drbd0: Began resync as SyncTarget (will sync 151620 KB [37905 bits set]). Jul 9 09:33:10 ldap-b kernel: [ 13.687902] drbd0: Writing meta data super block now. Jul 9 09:33:10 ldap-b kernel: [ 13.687996] drbd1: Writing meta data super block now. Jul 9 09:33:10 ldap-b kernel: [ 13.688601] drbd1: conn( WFSyncUUID -> PausedSyncT ) disk( Outdated -> Inconsistent ) Jul 9 09:33:10 ldap-b kernel: [ 13.688696] drbd1: Began resync as PausedSyncT (will sync 0 KB [0 bits set]). Jul 9 09:33:10 ldap-b kernel: [ 13.688774] drbd1: Resync done (total 1 sec; paused 0 sec; 0 K/sec) Jul 9 09:33:10 ldap-b kernel: [ 13.688849] drbd1: conn( PausedSyncT -> Connected ) disk( Inconsistent -> UpToDate ) Jul 9 09:33:10 ldap-b kernel: [ 13.689335] drbd1: helper command: /sbin/drbdadm after-resync-target Jul 9 09:33:10 ldap-b kernel: [ 13.689415] drbd1: Writing meta data super block now. Jul 9 09:33:14 ldap-b kernel: [ 15.761543] drbd0: Resync done (total 4 sec; paused 0 sec; 37904 K/sec) Jul 9 09:33:14 ldap-b kernel: [ 15.761642] drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Jul 9 09:33:14 ldap-b kernel: [ 15.762198] drbd0: helper command: /sbin/drbdadm after-resync-target Jul 9 09:33:14 ldap-b kernel: [ 15.763409] drbd1: aftr_isp( 1 -> 0 ) Jul 9 09:33:14 ldap-b kernel: [ 15.763480] drbd0: Writing meta data super block now. Jul 9 09:33:14 ldap-b kernel: [ 15.767950] drbd1: peer_isp( 1 -> 0 ) Jul 9 09:33:16 ldap-b kernel: [ 15.937945] drbd1: peer( Primary -> Secondary ) Jul 9 09:33:17 ldap-b kernel: [ 38.436159] drbd1: role( Secondary -> Primary ) Jul 9 09:33:17 ldap-b kernel: [ 16.027376] drbd1: Writing meta data super block now. Jul 9 09:33:17 ldap-b kernel: [ 24.341545] kjournald starting. Commit interval 5 seconds Jul 9 09:33:17 ldap-b kernel: [ 16.229763] EXT3 FS on drbd1, internal journal Jul 9 09:33:17 ldap-b kernel: [ 16.229887] EXT3-fs: mounted filesystem with journal data mode. Jul 9 09:48:46 ldap-b kernel: [ 579.513151] drbd0: conn( Connected -> VerifyT ) Jul 9 09:48:46 ldap-b kernel: [ 579.513151] drbd1: conn( Connected -> VerifyT ) Jul 9 09:48:46 ldap-b kernel: [ 579.881174] drbd0: Out of sync: start=33408, size=8 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 579.881174] drbd0: Out of sync: start=33384, size=24 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 579.881174] drbd0: Out of sync: start=33496, size=8 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 579.881174] drbd0: Out of sync: start=33512, size=16 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 579.932420] drbd0: Out of sync: start=34216, size=8 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 579.932522] drbd0: Out of sync: start=34320, size=8 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 579.932602] drbd0: Out of sync: start=34344, size=8 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 579.932692] drbd0: Out of sync: start=34408, size=16 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 579.932784] drbd0: Out of sync: start=34504, size=8 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 579.932875] drbd0: Out of sync: start=34592, size=8 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 579.932954] drbd0: Out of sync: start=34616, size=8 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 579.933032] drbd0: Out of sync: start=34632, size=8 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 579.933156] drbd0: Out of sync: start=34840, size=32 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 580.004526] drbd0: Out of sync: start=42448, size=32 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 580.004658] drbd0: Out of sync: start=42680, size=16 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 580.004736] drbd0: Out of sync: start=42704, size=8 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 580.004837] drbd0: Out of sync: start=42824, size=8 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 580.004936] drbd0: Out of sync: start=42936, size=16 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 580.005022] drbd0: Out of sync: start=43008, size=8 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 580.005101] drbd0: Out of sync: start=43024, size=16 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 580.005213] drbd0: Out of sync: start=43144, size=8 (sectors) Jul 9 09:48:46 ldap-b kernel: [ 580.021183] drbd0: Out of sync: start=42120, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.149145] drbd0: Out of sync: start=54928, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.149208] drbd0: Out of sync: start=54992, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.149312] drbd0: Out of sync: start=55104, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.393207] drbd0: Out of sync: start=75688, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.393207] drbd0: Out of sync: start=75728, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.393207] drbd0: Out of sync: start=75784, size=16 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.393207] drbd0: Out of sync: start=75824, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.393207] drbd0: Out of sync: start=75856, size=24 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.393207] drbd0: Out of sync: start=75936, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.393207] drbd0: Out of sync: start=75960, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.537216] drbd0: Out of sync: start=93848, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.537216] drbd0: Out of sync: start=93872, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.537216] drbd0: Out of sync: start=93912, size=16 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.537961] drbd0: Out of sync: start=93928, size=40 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.541216] drbd0: Out of sync: start=93968, size=168 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.541216] drbd0: Out of sync: start=94152, size=24 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.542939] drbd0: Out of sync: start=94448, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.543017] drbd0: Out of sync: start=94472, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.597245] drbd0: Out of sync: start=94536, size=16 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.597330] drbd0: Out of sync: start=94560, size=24 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.597436] drbd0: Out of sync: start=94704, size=16 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.597522] drbd0: Out of sync: start=94776, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.597601] drbd0: Out of sync: start=94792, size=16 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.597703] drbd0: Out of sync: start=94936, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.757015] drbd0: Out of sync: start=108968, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.757105] drbd0: Out of sync: start=108992, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.757194] drbd0: Out of sync: start=109016, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.757236] drbd0: Out of sync: start=109024, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.757312] drbd0: Out of sync: start=109032, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.757389] drbd0: Out of sync: start=109040, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.757468] drbd0: Out of sync: start=109048, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.757544] drbd0: Out of sync: start=109056, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.757621] drbd0: Out of sync: start=109064, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.757697] drbd0: Out of sync: start=109072, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.757779] drbd0: Out of sync: start=109080, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.757854] drbd0: Out of sync: start=109088, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.757933] drbd0: Out of sync: start=109096, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.758008] drbd0: Out of sync: start=109104, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.758085] drbd0: Out of sync: start=109112, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.758162] drbd0: Out of sync: start=109120, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.758265] drbd0: Out of sync: start=109128, size=112 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.758347] drbd0: Out of sync: start=109256, size=24 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.758465] drbd0: Out of sync: start=109472, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.758547] drbd0: Out of sync: start=109504, size=16 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.758626] drbd0: Out of sync: start=109536, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.758706] drbd0: Out of sync: start=109560, size=16 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.758787] drbd0: Out of sync: start=109584, size=24 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.758875] drbd0: Out of sync: start=109664, size=16 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.758962] drbd0: Out of sync: start=109736, size=8 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.759041] drbd0: Out of sync: start=109752, size=16 (sectors) Jul 9 09:48:47 ldap-b kernel: [ 580.766251] drbd0: Out of sync: start=109880, size=8 (sectors) Jul 9 09:48:48 ldap-b kernel: [ 580.985244] drbd0: Out of sync: start=134728, size=16 (sectors) Jul 9 09:48:48 ldap-b kernel: [ 581.047252] drbd0: Out of sync: start=135104, size=8 (sectors) Jul 9 09:48:48 ldap-b kernel: [ 581.047361] drbd0: Out of sync: start=135224, size=16 (sectors) Jul 9 09:48:48 ldap-b kernel: [ 581.047440] drbd0: Out of sync: start=135256, size=8 (sectors) Jul 9 09:48:48 ldap-b kernel: [ 581.047525] drbd0: Out of sync: start=135288, size=16 (sectors) Jul 9 09:48:48 ldap-b kernel: [ 581.047601] drbd0: Out of sync: start=135312, size=8 (sectors) Jul 9 09:48:48 ldap-b kernel: [ 581.047697] drbd0: Out of sync: start=135416, size=8 (sectors) Jul 9 09:48:48 ldap-b kernel: [ 581.047778] drbd0: Out of sync: start=135456, size=8 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 581.712960] drbd0: Out of sync: start=195680, size=24 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 581.713053] drbd0: Out of sync: start=195744, size=24 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 581.713290] drbd0: Out of sync: start=198032, size=16 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 581.713290] drbd0: Out of sync: start=198160, size=8 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 581.715658] drbd0: Out of sync: start=195304, size=16 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 581.717572] drbd0: Out of sync: start=196048, size=8 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 581.729243] drbd0: Out of sync: start=196160, size=16 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 581.729312] drbd0: Out of sync: start=196232, size=8 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 581.729393] drbd0: Out of sync: start=196248, size=16 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 581.729492] drbd0: Out of sync: start=196376, size=8 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 581.729993] drbd0: Out of sync: start=195320, size=8 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 582.021309] drbd0: Out of sync: start=226864, size=8 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 582.021309] drbd0: Out of sync: start=226904, size=8 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 582.021309] drbd0: Out of sync: start=226920, size=16 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 582.021309] drbd0: Out of sync: start=226944, size=8 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 582.021309] drbd0: Out of sync: start=227040, size=8 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 582.021309] drbd0: Out of sync: start=227072, size=8 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 582.161087] drbd0: Out of sync: start=239624, size=8 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 582.161179] drbd0: Out of sync: start=239688, size=8 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 582.161260] drbd0: Out of sync: start=239704, size=8 (sectors) Jul 9 09:48:49 ldap-b kernel: [ 582.161472] drbd0: Out of sync: start=239808, size=8 (sectors) Jul 9 09:48:50 ldap-b kernel: [ 582.623923] drbd0: Out of sync: start=279368, size=8 (sectors) Jul 9 09:48:50 ldap-b kernel: [ 582.624079] drbd0: Out of sync: start=278968, size=8 (sectors) Jul 9 09:48:50 ldap-b kernel: [ 582.624159] drbd0: Out of sync: start=278992, size=8 (sectors) Jul 9 09:48:50 ldap-b kernel: [ 582.624246] drbd0: Out of sync: start=279032, size=24 (sectors) Jul 9 09:48:50 ldap-b kernel: [ 582.624325] drbd0: Out of sync: start=279064, size=16 (sectors) Jul 9 09:48:50 ldap-b kernel: [ 582.624409] drbd0: Out of sync: start=279112, size=16 (sectors) Jul 9 09:48:50 ldap-b kernel: [ 582.624492] drbd0: Out of sync: start=279152, size=24 (sectors) Jul 9 09:48:50 ldap-b kernel: [ 582.624592] drbd0: Out of sync: start=279272, size=24 (sectors) Jul 9 09:48:50 ldap-b kernel: [ 582.624681] drbd0: Out of sync: start=279312, size=56 (sectors) Jul 9 09:51:01 ldap-b kernel: [ 672.823561] drbd0: Out of sync: start=8222672, size=8 (sectors) Jul 9 09:51:02 ldap-b kernel: [ 673.173588] drbd0: Out of sync: start=8253296, size=8 (sectors) Jul 9 09:51:02 ldap-b kernel: [ 673.252864] drbd0: Out of sync: start=8255712, size=32 (sectors) Jul 9 09:51:02 ldap-b kernel: [ 673.268004] drbd0: Out of sync: start=8260544, size=16 (sectors) Jul 9 09:51:02 ldap-b kernel: [ 673.268089] drbd0: Out of sync: start=8260584, size=16 (sectors) Jul 9 09:51:02 ldap-b kernel: [ 673.268166] drbd0: Out of sync: start=8260608, size=8 (sectors) Jul 9 09:51:02 ldap-b kernel: [ 673.296147] drbd0: Out of sync: start=8266632, size=88 (sectors) Jul 9 09:51:02 ldap-b kernel: [ 673.299041] drbd0: Out of sync: start=8266720, size=16 (sectors) Jul 9 09:51:02 ldap-b kernel: [ 673.473608] drbd0: Out of sync: start=8280320, size=8 (sectors) Jul 9 09:54:05 ldap-b kernel: [ 800.670741] drbd1: Online verify done (total 319 sec; paused 0 sec; 30616 K/sec) Jul 9 09:54:05 ldap-b kernel: [ 800.670841] drbd1: conn( VerifyT -> Connected ) Jul 9 09:54:06 ldap-b kernel: [ 801.216397] drbd0: Online verify done (total 320 sec; paused 0 sec; 30520 K/sec) Jul 9 09:54:06 ldap-b kernel: [ 801.216490] drbd0: Online verify found 105 4k block out of sync! Jul 9 09:54:06 ldap-b kernel: [ 801.216567] drbd0: helper command: /sbin/drbdadm out-of-sync Jul 9 09:54:06 ldap-b kernel: [ 801.224863] drbd0: conn( VerifyT -> Connected ) Jul 9 09:54:06 ldap-b kernel: [ 801.224939] drbd0: Writing the whole bitmap, due to failed kmalloc Jul 9 09:54:06 ldap-b kernel: [ 801.226171] drbd0: writing of bitmap took 0 jiffies Jul 9 09:54:06 ldap-b kernel: [ 801.226244] drbd0: 420 KB (105 bits) marked out-of-sync by on disk bit-map. -------------------------------------------------------->8------------------------- That's an awful lot of blocks !! Besides, I see no reason why it should be out-of-sync at all... I put the services on ldap-a and disconnected and reconnected DRBD on ldap-b to allow resynchronization : # drbdadm disconnect all # drbdadm connect all When synchronization completed, I ran 'drbdadm verify all' again : ---------8<------------------------------------------------------------------------ Jul 9 09:54:36 ldap-b kernel: [ 811.325532] drbd1: role( Primary -> Secondary ) Jul 9 09:54:36 ldap-b kernel: [ 816.458995] drbd1: Writing meta data super block now. Jul 9 09:54:37 ldap-b kernel: [ 816.492114] drbd1: peer( Secondary -> Primary ) Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd0: Requested state change failed by peer: Refusing to be Primary while peer is not outdated Jul 9 09:55:07 ldap-b kernel: [ 841.194640] drbd0: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) pdsk( UpToDate -> DUnknown ) Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd0: short read expecting header on sock: r=-512 Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd0: asender terminated Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd0: Terminating asender thread Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd0: Writing meta data super block now. Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd0: tl_clear() Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd0: Connection closed Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd0: conn( Disconnecting -> StandAlone ) Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd0: receiver terminated Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd0: Terminating receiver thread Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd1: Requested state change failed by peer: Refusing to be Primary while peer is not outdated Jul 9 09:55:07 ldap-b kernel: [ 841.196756] drbd1: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) pdsk( UpToDate -> DUnknown ) Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd1: Writing meta data super block now. Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd1: short read expecting header on sock: r=-512 Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd1: asender terminated Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd1: Terminating asender thread Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd1: tl_clear() Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd1: Connection closed Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd1: conn( Disconnecting -> StandAlone ) Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd1: receiver terminated Jul 9 09:55:07 ldap-b kernel: [ 819.060282] drbd1: Terminating receiver thread Jul 9 09:55:11 ldap-b kernel: [ 845.296145] drbd0: conn( StandAlone -> Unconnected ) Jul 9 09:55:11 ldap-b kernel: [ 819.319669] drbd0: Starting receiver thread (from drbd0_worker [3162]) Jul 9 09:55:11 ldap-b kernel: [ 819.319768] drbd0: receiver (re)started Jul 9 09:55:11 ldap-b kernel: [ 819.319839] drbd0: conn( Unconnected -> WFConnection ) Jul 9 09:55:11 ldap-b kernel: [ 845.296865] drbd1: conn( StandAlone -> Unconnected ) Jul 9 09:55:11 ldap-b kernel: [ 819.320387] drbd1: Starting receiver thread (from drbd1_worker [3166]) Jul 9 09:55:11 ldap-b kernel: [ 819.320475] drbd1: receiver (re)started Jul 9 09:55:11 ldap-b kernel: [ 819.320545] drbd1: conn( Unconnected -> WFConnection ) Jul 9 09:55:11 ldap-b kernel: [ 819.420230] drbd1: Handshake successful: Agreed network protocol version 88 Jul 9 09:55:11 ldap-b kernel: [ 819.420395] drbd0: Handshake successful: Agreed network protocol version 88 Jul 9 09:55:11 ldap-b kernel: [ 819.428414] drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC Jul 9 09:55:11 ldap-b kernel: [ 819.428488] drbd0: conn( WFConnection -> WFReportParams ) Jul 9 09:55:11 ldap-b kernel: [ 819.428570] drbd0: Starting asender thread (from drbd0_receiver [5552]) Jul 9 09:55:11 ldap-b kernel: [ 819.428666] drbd1: Peer authenticated using 20 bytes of 'sha1' HMAC Jul 9 09:55:11 ldap-b kernel: [ 819.428740] drbd1: conn( WFConnection -> WFReportParams ) Jul 9 09:55:11 ldap-b kernel: [ 819.428822] drbd1: Starting asender thread (from drbd1_receiver [5554]) Jul 9 09:55:11 ldap-b kernel: [ 819.430675] drbd0: data-integrity-alg: <not-used> Jul 9 09:55:11 ldap-b kernel: [ 819.431215] drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Jul 9 09:55:11 ldap-b kernel: [ 819.431311] drbd0: Writing meta data super block now. Jul 9 09:55:11 ldap-b kernel: [ 819.431419] drbd1: data-integrity-alg: <not-used> Jul 9 09:55:11 ldap-b kernel: [ 819.431509] drbd1: Writing the whole bitmap, full sync required after drbd_sync_handshake. Jul 9 09:55:11 ldap-b kernel: [ 819.431603] drbd1: Writing meta data super block now. Jul 9 09:55:11 ldap-b kernel: [ 819.432441] drbd1: writing of bitmap took 0 jiffies Jul 9 09:55:11 ldap-b kernel: [ 819.432513] drbd1: 9539 MB (2441873 bits) marked out-of-sync by on disk bit-map. Jul 9 09:55:11 ldap-b kernel: [ 819.432602] drbd1: Writing meta data super block now. Jul 9 09:55:11 ldap-b kernel: [ 819.433165] drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Jul 9 09:55:11 ldap-b kernel: [ 819.433261] drbd1: Writing meta data super block now. Jul 9 09:55:11 ldap-b kernel: [ 819.438782] drbd1: conn( WFBitMapT -> WFSyncUUID ) Jul 9 09:55:11 ldap-b kernel: [ 819.440311] drbd1: helper command: /sbin/drbdadm before-resync-target Jul 9 09:55:11 ldap-b kernel: [ 819.441431] drbd1: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) Jul 9 09:55:11 ldap-b kernel: [ 819.441524] drbd1: Began resync as SyncTarget (will sync 9767492 KB [2441873 bits set]). Jul 9 09:55:11 ldap-b kernel: [ 819.441617] drbd1: Writing meta data super block now. Jul 9 09:55:11 ldap-b kernel: [ 819.465453] drbd0: conn( WFBitMapT -> WFSyncUUID ) Jul 9 09:55:11 ldap-b kernel: [ 819.467149] drbd0: helper command: /sbin/drbdadm before-resync-target Jul 9 09:55:11 ldap-b kernel: [ 819.468233] drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) Jul 9 09:55:11 ldap-b kernel: [ 819.468308] drbd1: conn( SyncTarget -> PausedSyncT ) aftr_isp( 0 -> 1 ) Jul 9 09:55:11 ldap-b kernel: [ 819.468308] drbd1: Resync suspended Jul 9 09:55:11 ldap-b kernel: [ 819.468318] drbd0: Began resync as SyncTarget (will sync 2068 KB [517 bits set]). Jul 9 09:55:11 ldap-b kernel: [ 819.468409] drbd0: Writing meta data super block now. Jul 9 09:55:11 ldap-b kernel: [ 819.478913] drbd1: peer_isp( 0 -> 1 ) Jul 9 09:55:11 ldap-b kernel: [ 819.516311] drbd0: Resync done (total 1 sec; paused 0 sec; 2068 K/sec) Jul 9 09:55:11 ldap-b kernel: [ 819.516311] drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Jul 9 09:55:11 ldap-b kernel: [ 819.516311] drbd0: helper command: /sbin/drbdadm after-resync-target Jul 9 09:55:11 ldap-b kernel: [ 819.516311] drbd1: peer_isp( 1 -> 0 ) Jul 9 09:55:11 ldap-b kernel: [ 819.516311] drbd1: conn( PausedSyncT -> SyncTarget ) aftr_isp( 1 -> 0 ) Jul 9 09:55:11 ldap-b kernel: [ 819.516311] drbd1: Syncer continues. Jul 9 09:55:11 ldap-b kernel: [ 819.516312] drbd0: Writing meta data super block now. Jul 9 09:59:59 ldap-b kernel: [ 982.455134] drbd1: Resync done (total 288 sec; paused 0 sec; 33912 K/sec) Jul 9 09:59:59 ldap-b kernel: [ 982.455219] drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Jul 9 09:59:59 ldap-b kernel: [ 982.455776] drbd1: helper command: /sbin/drbdadm after-resync-target Jul 9 09:59:59 ldap-b kernel: [ 982.457052] drbd1: Writing meta data super block now. Jul 9 12:40:41 ldap-b kernel: [ 4734.309515] drbd0: conn( Connected -> VerifyT ) Jul 9 12:40:41 ldap-b kernel: [ 4734.309515] drbd1: conn( Connected -> VerifyT ) Jul 9 12:40:42 ldap-b kernel: [ 4734.505528] drbd0: Out of sync: start=19880, size=8 (sectors) Jul 9 12:40:42 ldap-b kernel: [ 4734.555944] drbd0: Out of sync: start=21008, size=24 (sectors) Jul 9 12:40:42 ldap-b kernel: [ 4734.556010] drbd0: Out of sync: start=21040, size=120 (sectors) Jul 9 12:40:42 ldap-b kernel: [ 4734.556045] drbd0: Out of sync: start=21168, size=8 (sectors) Jul 9 12:40:42 ldap-b kernel: [ 4734.556086] drbd0: Out of sync: start=21200, size=32 (sectors) Jul 9 12:40:42 ldap-b kernel: [ 4734.556123] drbd0: Out of sync: start=21240, size=16 (sectors) Jul 9 12:40:42 ldap-b kernel: [ 4734.556235] drbd0: Out of sync: start=21608, size=24 (sectors) Jul 9 12:40:42 ldap-b kernel: [ 4734.556275] drbd0: Out of sync: start=21664, size=16 (sectors) Jul 9 12:40:42 ldap-b kernel: [ 4734.556308] drbd0: Out of sync: start=21688, size=8 (sectors) Jul 9 12:40:42 ldap-b kernel: [ 4734.556339] drbd0: Out of sync: start=21704, size=8 (sectors) Jul 9 12:40:42 ldap-b kernel: [ 4734.556371] drbd0: Out of sync: start=21720, size=8 (sectors) Jul 9 12:40:42 ldap-b kernel: [ 4734.556426] drbd0: Out of sync: start=21752, size=80 (sectors) Jul 9 12:40:42 ldap-b kernel: [ 4734.556500] drbd0: Out of sync: start=22016, size=16 (sectors) Jul 9 12:40:42 ldap-b kernel: [ 4734.556535] drbd0: Out of sync: start=22048, size=8 (sectors) Jul 9 12:40:42 ldap-b kernel: [ 4734.556574] drbd0: Out of sync: start=22088, size=8 (sectors) Jul 9 12:40:42 ldap-b kernel: [ 4734.556618] drbd0: Out of sync: start=22152, size=16 (sectors) Jul 9 12:40:42 ldap-b kernel: [ 4734.816985] drbd0: Out of sync: start=44632, size=8 (sectors) Jul 9 12:40:42 ldap-b kernel: [ 4735.141568] drbd0: Out of sync: start=73552, size=8 (sectors) Jul 9 12:40:43 ldap-b kernel: [ 4735.373582] drbd0: Out of sync: start=94120, size=8 (sectors) Jul 9 12:40:43 ldap-b kernel: [ 4735.373582] drbd0: Out of sync: start=94136, size=8 (sectors) Jul 9 12:40:43 ldap-b kernel: [ 4735.539493] drbd0: Out of sync: start=112944, size=16 (sectors) Jul 9 12:40:43 ldap-b kernel: [ 4735.729605] drbd0: Out of sync: start=123856, size=8 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4735.948401] drbd0: Out of sync: start=141176, size=8 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4735.953654] drbd0: Out of sync: start=142448, size=8 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4735.955387] drbd0: Out of sync: start=142752, size=32 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4735.957198] drbd0: Out of sync: start=143016, size=8 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4735.957234] drbd0: Out of sync: start=143040, size=8 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4735.957268] drbd0: Out of sync: start=143056, size=8 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4735.957320] drbd0: Out of sync: start=143160, size=8 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4735.957619] drbd0: Out of sync: start=143304, size=16 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4736.193199] drbd0: Out of sync: start=167040, size=224 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4736.193243] drbd0: Out of sync: start=167288, size=24 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4736.193422] drbd0: Out of sync: start=166968, size=8 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4736.193459] drbd0: Out of sync: start=167000, size=8 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4736.197704] drbd0: Out of sync: start=167632, size=8 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4736.197745] drbd0: Out of sync: start=167664, size=16 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4736.197783] drbd0: Out of sync: start=167704, size=8 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4736.197821] drbd0: Out of sync: start=167736, size=16 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4736.197859] drbd0: Out of sync: start=167760, size=24 (sectors) Jul 9 12:40:44 ldap-b kernel: [ 4736.197917] drbd0: Out of sync: start=167904, size=16 (sectors) Jul 9 12:40:45 ldap-b kernel: [ 4737.192747] drbd0: Out of sync: start=250304, size=8 (sectors) Jul 9 12:40:45 ldap-b kernel: [ 4737.277286] drbd0: Out of sync: start=261992, size=8 (sectors) Jul 9 12:40:46 ldap-b kernel: [ 4737.357708] drbd0: Out of sync: start=271184, size=8 (sectors) Jul 9 12:40:46 ldap-b kernel: [ 4737.357708] drbd0: Out of sync: start=271496, size=32 (sectors) Jul 9 12:40:46 ldap-b kernel: [ 4737.357708] drbd0: Out of sync: start=271640, size=8 (sectors) Jul 9 12:40:46 ldap-b kernel: [ 4737.357708] drbd0: Out of sync: start=271656, size=16 (sectors) Jul 9 12:40:46 ldap-b kernel: [ 4737.357708] drbd0: Out of sync: start=271792, size=8 (sectors) Jul 9 12:40:46 ldap-b kernel: [ 4737.357708] drbd0: Out of sync: start=271904, size=16 (sectors) Jul 9 12:42:50 ldap-b kernel: [ 4823.344894] drbd0: Out of sync: start=7943192, size=8 (sectors) Jul 9 12:42:55 ldap-b kernel: [ 4826.871364] drbd0: Out of sync: start=8251864, size=8 (sectors) Jul 9 12:42:55 ldap-b kernel: [ 4826.886980] drbd0: Out of sync: start=8251848, size=8 (sectors) Jul 9 12:45:56 ldap-b kernel: [ 4952.607170] drbd1: Online verify done (total 314 sec; paused 0 sec; 31104 K/sec) Jul 9 12:45:56 ldap-b kernel: [ 4952.607224] drbd1: conn( VerifyT -> Connected ) Jul 9 12:45:57 ldap-b kernel: [ 4953.155344] drbd0: Online verify done (total 315 sec; paused 0 sec; 31004 K/sec) Jul 9 12:45:57 ldap-b kernel: [ 4953.155344] drbd0: Online verify found 19 4k block out of sync! Jul 9 12:45:57 ldap-b kernel: [ 4953.155344] drbd0: helper command: /sbin/drbdadm out-of-sync Jul 9 12:45:57 ldap-b kernel: [ 4953.176074] drbd0: conn( VerifyT -> Connected ) Jul 9 12:45:57 ldap-b kernel: [ 4953.176104] drbd0: Writing the whole bitmap, due to failed kmalloc Jul 9 12:45:57 ldap-b kernel: [ 4953.177306] drbd0: writing of bitmap took 1 jiffies Jul 9 12:45:57 ldap-b kernel: [ 4953.177335] drbd0: 76 KB (19 bits) marked out-of-sync by on disk bit-map. -------------------------------------------------------->8------------------------- Any idea how that is possible ? (Well, I guess the answer is : the kernel :-\) Thanks, Eric Eric Marin wrote (2008.06.23 16:32) : > Hello and sorry about the length of this report, > > I'm the administrator of a cluster which consists of two nodes : ldap-a > and ldap-b. > > Both nodes are identical and run on : > -Debian Etch (up to date) > -DRBD 8.2.6 > -Heartbeat 2.1.3-5~bpo40+1 from etch-backports, patched to fix this bug > : http://hg.linux-ha.org/dev/rev/47f60bebe7b2 > -EXT3 filesystem > (...)