Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Our local heartbeat/drbd cluster has been moody for the last few weeks,
but I suddenly noticed a drbd2 volume being marked as completely
out-of-date, requiring a 3TB resync. there should've only been a few
pages out of sync.
The resource manager scripts were sending commands to DRBD so I'd guess
they triggered this somehow, but I think this shouldn't happen?
The logentries around the invalidation are:
Jan 2 23:08:49 lenny kernel: [47674.701229] block drbd2: disk( Outdated
-> Inconsistent )
Jan 2 23:08:49 lenny kernel: [47674.884449] block drbd2: bitmap WRITE
of 22356 pages took 1107 jiffies
Jan 2 23:08:49 lenny kernel: [47674.993632] block drbd2: 4 KB (1 bits)
marked out-of-sync by on disk bit-map.
Jan 2 23:08:49 lenny kernel: [47674.993679] block drbd2: Connection closed
Jan 2 23:08:49 lenny kernel: [47674.993691] block drbd2: conn(
Disconnecting -> StandAlone )
Jan 2 23:08:50 lenny kernel: [47676.161611] block drbd2: bitmap WRITE
of 22356 pages took 1105 jiffies
Jan 2 23:08:50 lenny kernel: [47676.285215] block drbd2: 2794 GB
(732544019 bits) marked out-of-sync by on disk bit-map.
Jan 2 23:08:50 lenny kernel: [47676.294027] block drbd2: receiver
terminated
Jan 2 23:08:50 lenny kernel: [47676.294031] block drbd2: Terminating
receiver thread
Jan 2 23:08:50 lenny kernel: [47676.294033] block drbd2: disk(
Inconsistent -> Diskless )
Jan 2 23:08:50 lenny kernel: [47676.294154] block drbd2: drbd_bm_resize
called with capacity == 0
Jan 2 23:08:50 lenny lrmd: [337634]: info: RA output:
(gen_drbdprim_drbd2:0:stop:stdout)
Jan 2 23:08:50 lenny kernel: [47676.298815] block drbd2: worker terminated
Jan 2 23:08:50 lenny kernel: [47676.298822] block drbd2: Terminating
worker thread
version: 8.3.10 (api:88/proto:86-96)
GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by
phil at fat-tyre, 2011-01-28 12:17:35
kernel: openvz 2.6.32-042stab039.11
2: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate A r-----
ns:0 nr:52667648 dw:52662912 dr:0 al:0 bm:3213 lo:39 pe:7350 ua:41
ap:0 ep:1 wo:b oos:2877513676
[>....................] sync'ed: 1.8% (2810068/2861500)M
finish: 8:43:40 speed: 91,576 (90,020) want: 102,400 K/sec
smartctl reports no problems for this disk, a HDS723030ALA640 (Hitachi
Deskstar 7K3000) but I'm already RMA-ing another drive of the same type
which is 'FAILING_NOW', so I can't rule out hardware or drive issues.
Is this a known issue and should I try to move to the latest 8.3.x
version? Any other information I can provide?
Thanks in advance for any insight.
--
Arnold Hendriks<a.hendriks at b-lex.nl>
B-Lex IT B.V., Postbus 545, NL-7500 AM Enschede
KvK: 08174333 http://www.b-lex.nl/
T: +31 (0)534836543 M: +31 (0)651710159