Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Mon, Jan 02, 2012 at 11:45:19PM +0100, Arnold Hendriks wrote: > Our local heartbeat/drbd cluster has been moody for the last few > weeks, but I suddenly noticed a drbd2 volume being marked as > completely out-of-date, requiring a 3TB resync. there should've only > been a few pages out of sync. > > The resource manager scripts were sending commands to DRBD so I'd > guess they triggered this somehow, but I think this shouldn't > happen? > > The logentries around the invalidation are: > > Jan 2 23:08:49 lenny kernel: [47674.701229] block drbd2: disk( Outdated -> Inconsistent ) > Jan 2 23:08:49 lenny kernel: [47674.884449] block drbd2: bitmap WRITE of 22356 pages took 1107 jiffies > Jan 2 23:08:49 lenny kernel: [47674.993632] block drbd2: 4 KB (1 bits) marked out-of-sync by on disk bit-map. Here, only one bit is "out of sync". > Jan 2 23:08:49 lenny kernel: [47674.993679] block drbd2: Connection closed > Jan 2 23:08:49 lenny kernel: [47674.993691] block drbd2: conn( Disconnecting -> StandAlone ) > Jan 2 23:08:50 lenny kernel: [47676.161611] block drbd2: bitmap WRITE of 22356 pages took 1105 jiffies > Jan 2 23:08:50 lenny kernel: [47676.285215] block drbd2: 2794 GB (732544019 bits) marked out-of-sync by on disk bit-map. There, suddenly many more (all?) bits are out of sync. Someone did something "funny" in between, possibly explicitly trigger a full sync, or a resync handshake gone wrong (which should have left some interesting log lines, though). See if you can find some more logs, maybe you have not all log level in the same file. Also see if you can correlate kernel logs with bash history or pacemaker logs ("... sending commands to DRBD ..."). > Jan 2 23:08:50 lenny kernel: [47676.294027] block drbd2: receiver terminated > Jan 2 23:08:50 lenny kernel: [47676.294031] block drbd2: Terminating receiver thread > Jan 2 23:08:50 lenny kernel: [47676.294033] block drbd2: disk( Inconsistent -> Diskless ) > Jan 2 23:08:50 lenny kernel: [47676.294154] block drbd2: drbd_bm_resize called with capacity == 0 > Jan 2 23:08:50 lenny lrmd: [337634]: info: RA output: (gen_drbdprim_drbd2:0:stop:stdout) > Jan 2 23:08:50 lenny kernel: [47676.298815] block drbd2: worker terminated > Jan 2 23:08:50 lenny kernel: [47676.298822] block drbd2: Terminating worker thread > > version: 8.3.10 (api:88/proto:86-96) > GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by > phil at fat-tyre, 2011-01-28 12:17:35 > > kernel: openvz 2.6.32-042stab039.11 > > 2: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate A r----- > ns:0 nr:52667648 dw:52662912 dr:0 al:0 bm:3213 lo:39 pe:7350 > ua:41 ap:0 ep:1 wo:b oos:2877513676 > [>....................] sync'ed: 1.8% (2810068/2861500)M > finish: 8:43:40 speed: 91,576 (90,020) want: 102,400 K/sec > > smartctl reports no problems for this disk, a HDS723030ALA640 > (Hitachi Deskstar 7K3000) but I'm already RMA-ing another drive of > the same type which is 'FAILING_NOW', so I can't rule out hardware > or drive issues. > > Is this a known issue and should I try to move to the latest 8.3.x > version? Any other information I can provide? > > Thanks in advance for any insight. > > -- > > Arnold Hendriks<a.hendriks at b-lex.nl> > B-Lex IT B.V., Postbus 545, NL-7500 AM Enschede > KvK: 08174333 http://www.b-lex.nl/ T: +31 (0)534836543 > M: +31 (0)651710159 -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed