Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I recovered from this situation by simply zeroing the metadata, while DRBD down. This is not something that would work on-line, but it worked fast and easy (the long delay was deciding to actually erase the metadata). Zero the second segment of the metadata partition (on both sides) : [root at malauzat:root]# dd if=/dev/zero of=/dev/hda10 bs=1M count=128 seek=128 Get DRBD up, then force the known good disk into primary state : [root at malauzat:root]# /sbin/drbdsetup /dev/drbd1 primary --do-what-I-say To be sure, invalidate the data on the secondary, and roll on for a full sync. NH Nicolas Huillard a écrit : > Hello, > > Short story : > Many hours after a hardware crash correctly handled, DRBD over LVM does > not uses anymore the correct device size (629145600 bytes instead of > 6291456000, or 614400KB instead of 6144000KB). Device size reported by > LVM seems correct. > > > Long story : > > A device size problem occured yesterday on a test setup : > > /home/postgres -> /dev/drbd1 -> /dev/bases/bases -> /dev/sda2 > (reiserfs) (DRBD 0.7.5) (LVM from 2.4.27) actual disk > > The actual partition is 17.20GB, on wich 5.86GB are allocated by LVM and > available to DRBD. > Everything was working cleanly until yesterday, with this setup. I first > added a second disk (/dev/sdb) to the hot-plug SCSI bus, then tested it > (an old disk, that had errors). The errors on SDB led to a complete halt > of SDA, the replicated disk (I can't make it respond to SCSI commands > any more : it seems totally dead). > /dev/sda were replicated by DRBD 0.7.5 on kernel 2.4.27 (Debian). > DRBD placed the device in the ClientWithoutDisk state when SDA crashed > (this mode was new for me, so I let it go this way a few hours, and > searched for information about this state). > I then switched all services to the peer server, and disconnected DRBD > to install a new disk. OK for this part. > > Many hours later, Postgres (which reside over DRBD) tried to access a > sector beyond the device end : "ERROR: cannot read block 142 of > tbl_carac_cle_doc_code_classif_: Input/output error" > I immediately stopped Postgres, and investigated. > This happened "many hours later", but I don't know if it's because > Postgres was not really used until then, or if the actual cause began > many hours after the crash of the peer's disk. > > It appears that DRBD only sees 629145600 out of the 6291456000 bytes on > the LVM device (thus exactly ten times less), and refuses to extend to > the whole space. > I upgraded DRBD 0.7.5 to 0.7.10 before reporting this problem, just to > be sure it was not a bug in 0.7.5. LVM is still the same : the one in > stock kernel 2.4.27. > > This is the result of a "dd if=the_LVM_device_then_the_DRBD_device", > showing what a standard command reads as actual device size : > > 6291456000 Feb 3 20:10 dev_bases_bases.dd > 629145600 Feb 3 16:01 dev_drbd1.dd > > LVM says the device is still 6GB : > > [root at malauzat:root]# lvdisplay /dev/bases/bases > --- Logical volume --- > LV Name /dev/bases/bases > VG Name bases > LV Write Access read/write > LV Status available > LV # 1 > # open 2 > LV Size 5.86 GB > Current LE 1500 > Allocated LE 1500 > Allocation next free > Read ahead sectors 1024 > Block device 58:1 > > When I setup DRBD, I get nothing ou stdout, but lines in syslog tell me > that the size of the device is one tenth of its actual size : > > drbd1: resync bitmap: bits=153600 words=4800 > drbd1: size = 600 MB (614400 KB) > drbd1: 600 MB marked out-of-sync by on disk bit-map. > drbd1: Found 6 transactions (324 active extents) in activity log. > drbd1: drbdsetup : cstate Unconfigured --> StandAlone > drbd1: drbdsetup : cstate StandAlone --> Unconnected > drbd1: drbd1_receiver : cstate Unconnected --> WFConnection > > When I mount the FS over DRBD, ReiserFS correctly complains about device > size vs. FS size, because the FS was created and worked with a 6GB device : > > Feb 4 14:43:09 malauzat kernel: reiserfs: found format "3.6" with > standard journal > Feb 4 14:43:09 malauzat kernel: Filesystem on 93:01 cannot be mounted > because it is bigger than the device > Feb 4 14:43:09 malauzat kernel: You may need to run fsck or increase > size of your LVM partition > Feb 4 14:43:09 malauzat kernel: Or may be you forgot to reboot after > fdisk when it told you to > > When I try to force the size of the DRBD device, it refuses : > > [root at malauzat:root]# /sbin/drbdsetup /dev/drbd1 disk /dev/bases/bases > /dev/hda10 1 --on-io-error=detach -d 6144000 > > drbd1: Requested disk size is too big (6144000 > 614400) > drbd1: size = 600 MB (614400 KB) > drbd1: 600 MB marked out-of-sync by on disk bit-map. > drbd1: Found 6 transactions (324 active extents) in activity log. > drbd1: drbdsetup : cstate Unconfigured --> StandAlone > > > Thanks for any advice, clues, etc. >