Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Our local heartbeat/drbd cluster has been moody for the last few weeks, but I suddenly noticed a drbd2 volume being marked as completely out-of-date, requiring a 3TB resync. there should've only been a few pages out of sync. The resource manager scripts were sending commands to DRBD so I'd guess they triggered this somehow, but I think this shouldn't happen? The logentries around the invalidation are: Jan 2 23:08:49 lenny kernel: [47674.701229] block drbd2: disk( Outdated -> Inconsistent ) Jan 2 23:08:49 lenny kernel: [47674.884449] block drbd2: bitmap WRITE of 22356 pages took 1107 jiffies Jan 2 23:08:49 lenny kernel: [47674.993632] block drbd2: 4 KB (1 bits) marked out-of-sync by on disk bit-map. Jan 2 23:08:49 lenny kernel: [47674.993679] block drbd2: Connection closed Jan 2 23:08:49 lenny kernel: [47674.993691] block drbd2: conn( Disconnecting -> StandAlone ) Jan 2 23:08:50 lenny kernel: [47676.161611] block drbd2: bitmap WRITE of 22356 pages took 1105 jiffies Jan 2 23:08:50 lenny kernel: [47676.285215] block drbd2: 2794 GB (732544019 bits) marked out-of-sync by on disk bit-map. Jan 2 23:08:50 lenny kernel: [47676.294027] block drbd2: receiver terminated Jan 2 23:08:50 lenny kernel: [47676.294031] block drbd2: Terminating receiver thread Jan 2 23:08:50 lenny kernel: [47676.294033] block drbd2: disk( Inconsistent -> Diskless ) Jan 2 23:08:50 lenny kernel: [47676.294154] block drbd2: drbd_bm_resize called with capacity == 0 Jan 2 23:08:50 lenny lrmd: [337634]: info: RA output: (gen_drbdprim_drbd2:0:stop:stdout) Jan 2 23:08:50 lenny kernel: [47676.298815] block drbd2: worker terminated Jan 2 23:08:50 lenny kernel: [47676.298822] block drbd2: Terminating worker thread version: 8.3.10 (api:88/proto:86-96) GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by phil at fat-tyre, 2011-01-28 12:17:35 kernel: openvz 2.6.32-042stab039.11 2: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate A r----- ns:0 nr:52667648 dw:52662912 dr:0 al:0 bm:3213 lo:39 pe:7350 ua:41 ap:0 ep:1 wo:b oos:2877513676 [>....................] sync'ed: 1.8% (2810068/2861500)M finish: 8:43:40 speed: 91,576 (90,020) want: 102,400 K/sec smartctl reports no problems for this disk, a HDS723030ALA640 (Hitachi Deskstar 7K3000) but I'm already RMA-ing another drive of the same type which is 'FAILING_NOW', so I can't rule out hardware or drive issues. Is this a known issue and should I try to move to the latest 8.3.x version? Any other information I can provide? Thanks in advance for any insight. -- Arnold Hendriks<a.hendriks at b-lex.nl> B-Lex IT B.V., Postbus 545, NL-7500 AM Enschede KvK: 08174333 http://www.b-lex.nl/ T: +31 (0)534836543 M: +31 (0)651710159