Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 07/25/2011 01:31 PM, Christian Völker wrote: > Hi Felix, >> this reminds me of something I've seen during a drbdadm adjust. >> The adjusting Primary would re-initialize its DRBD and then (apparently) >> sync back its hot AL extents from the peer. Seeing as the peer was >> Inconsistent in your case (it was still syncing), this can't work, and >> the Primary may go StandAlone as a result. > What? Do you say the primary updates its data from the secondary during > an "adjust" command? Generally? No. In your case? Yes :-) >> Please try and verify this scenario using the kernel logs of your Primary. > Well, I'm not sure about the logfile entries- if appended the first > entry where it went bananas below. Score. See below. >> >> It's a good question why DRBD would commit to such destructive action >> without issuing a warning. What version are you using? I'm inclined to >> call bug on this one. > drbd82-8.2.6-1.el5.centos OK, that's ancient? So that's what was shipping with RHEL5? Hmm. You may want to a) step away from your distributor's packaged DRBD or b) update to EL6 or whatever's the latest and greatest. <snip> This is probably your adjust command kicking in: > Jul 24 21:41:58 backuppc kernel: drbd0: disk( UpToDate -> Diskless ) > Jul 24 21:41:58 backuppc kernel: drbd0: disk( Diskless -> Attaching ) > Jul 24 21:41:58 backuppc kernel: drbd0: Found 4 transactions (192 active > extents) in activity log. > Jul 24 21:41:58 backuppc kernel: drbd0: max_segment_size ( = BIO size ) > = 32768 > Jul 24 21:41:58 backuppc kernel: drbd0: reading of bitmap took 186 jiffies > Jul 24 21:41:58 backuppc kernel: drbd0: recounting of set bits took > additional 121 jiffies > Jul 24 21:41:58 backuppc kernel: drbd0: 0 KB (0 bits) marked out-of-sync > by on disk bit-map. > Jul 24 21:41:58 backuppc kernel: drbd0: Marked additional 508 MB as > out-of-sync based on AL. Bam. Your Primary is now officially out-of-sync. > Jul 24 21:41:58 backuppc kernel: drbd0: disk( Attaching -> Negotiating ) > Jul 24 21:41:58 backuppc kernel: drbd0: Writing meta data super block now. > Jul 24 21:42:07 backuppc kernel: drbd0: State change failed: Refusing to > be Primary without at least one UpToDate disk This is fucked up. I sincerely hope that more recent versions do this check *before* they mark your only consistent copy dirty. (This is usually the time when Linbit chimes in and explains why that can't be made to work ;-) The rest of it more or less describes your tale of woe. What config change were you adjusting for? Not any call to "adjust" will lead to re-attaching your backing device. It can't hurt to be mindful of this and check using the dryrun. But I agree - it sure shouldn't be required. Cheers, Felix