[DRBD-user] drbd volume suddenly out of sync

Mon Jan 2 23:45:19 CET 2012

Our local heartbeat/drbd cluster has been moody for the last few weeks, 
but I suddenly noticed a drbd2 volume being marked as completely 
out-of-date, requiring a 3TB resync. there should've only been a few 
pages out of sync.

The resource manager scripts were sending commands to DRBD so I'd guess 
they triggered this somehow, but I think this shouldn't happen?

The logentries around the invalidation are:

Jan  2 23:08:49 lenny kernel: [47674.701229] block drbd2: disk( Outdated 
-> Inconsistent )
Jan  2 23:08:49 lenny kernel: [47674.884449] block drbd2: bitmap WRITE 
of 22356 pages took 1107 jiffies
Jan  2 23:08:49 lenny kernel: [47674.993632] block drbd2: 4 KB (1 bits) 
marked out-of-sync by on disk bit-map.
Jan  2 23:08:49 lenny kernel: [47674.993679] block drbd2: Connection closed
Jan  2 23:08:49 lenny kernel: [47674.993691] block drbd2: conn( 
Disconnecting -> StandAlone )
Jan  2 23:08:50 lenny kernel: [47676.161611] block drbd2: bitmap WRITE 
of 22356 pages took 1105 jiffies
Jan  2 23:08:50 lenny kernel: [47676.285215] block drbd2: 2794 GB 
(732544019 bits) marked out-of-sync by on disk bit-map.
Jan  2 23:08:50 lenny kernel: [47676.294027] block drbd2: receiver 
terminated
Jan  2 23:08:50 lenny kernel: [47676.294031] block drbd2: Terminating 
receiver thread
Jan  2 23:08:50 lenny kernel: [47676.294033] block drbd2: disk( 
Inconsistent -> Diskless )
Jan  2 23:08:50 lenny kernel: [47676.294154] block drbd2: drbd_bm_resize 
called with capacity == 0
Jan  2 23:08:50 lenny lrmd: [337634]: info: RA output: 
(gen_drbdprim_drbd2:0:stop:stdout)
Jan  2 23:08:50 lenny kernel: [47676.298815] block drbd2: worker terminated
Jan  2 23:08:50 lenny kernel: [47676.298822] block drbd2: Terminating 
worker thread

version: 8.3.10 (api:88/proto:86-96)
GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by 
phil at fat-tyre, 2011-01-28 12:17:35

kernel: openvz 2.6.32-042stab039.11

  2: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate A r-----
     ns:0 nr:52667648 dw:52662912 dr:0 al:0 bm:3213 lo:39 pe:7350 ua:41 
ap:0 ep:1 wo:b oos:2877513676
         [>....................] sync'ed:  1.8% (2810068/2861500)M
         finish: 8:43:40 speed: 91,576 (90,020) want: 102,400 K/sec

smartctl reports no problems for this disk, a HDS723030ALA640 (Hitachi 
Deskstar 7K3000) but I'm already RMA-ing another drive of the same type 
which is 'FAILING_NOW', so I can't rule out hardware or drive issues.

Is this a known issue and should I try to move to the latest 8.3.x 
version? Any other information I can provide?

Thanks in advance for any insight.

-- 

Arnold Hendriks<a.hendriks at b-lex.nl>
B-Lex IT B.V., Postbus 545, NL-7500 AM Enschede
KvK: 08174333              http://www.b-lex.nl/  
T: +31 (0)534836543        M: +31 (0)651710159