[DRBD-user] DRBD 0.7.10 / kernel 2.4.27 / LVM : Filesystem on 93:01 cannot be mounted because it is bigger than the device

Thu Feb 10 18:35:00 CET 2005

I recovered from this situation by simply zeroing the metadata, while 
DRBD down. This is not something that would work on-line, but it worked 
fast and easy (the long delay was deciding to actually erase the metadata).

Zero the second segment of the metadata partition (on both sides) :

[root at malauzat:root]# dd if=/dev/zero of=/dev/hda10 bs=1M count=128 seek=128

Get DRBD up, then force the known good disk into primary state :

[root at malauzat:root]# /sbin/drbdsetup /dev/drbd1 primary --do-what-I-say

To be sure, invalidate the data on the secondary, and roll on for a full 
sync.

NH

Nicolas Huillard a écrit :
> Hello,
> 
> Short story :
> Many hours after a hardware crash correctly handled, DRBD over LVM does
> not uses anymore the correct device size (629145600 bytes instead of
> 6291456000, or 614400KB instead of 6144000KB). Device size reported by
> LVM seems correct.
> 
> 
> Long story :
> 
> A device size problem occured yesterday on a test setup :
> 
> /home/postgres -> /dev/drbd1 -> /dev/bases/bases -> /dev/sda2
> (reiserfs)        (DRBD 0.7.5)  (LVM from 2.4.27)   actual disk
> 
> The actual partition is 17.20GB, on wich 5.86GB are allocated by LVM and
> available to DRBD.
> Everything was working cleanly until yesterday, with this setup. I first
> added a second disk (/dev/sdb) to the hot-plug SCSI bus, then tested it
> (an old disk, that had errors). The errors on SDB led to a complete halt
> of SDA, the replicated disk (I can't make it respond to SCSI commands
> any more : it seems totally dead).
> /dev/sda[12] were replicated by DRBD 0.7.5 on kernel 2.4.27 (Debian).
> DRBD placed the device in the ClientWithoutDisk state when SDA crashed
> (this mode was new for me, so I let it go this way a few hours, and
> searched for information about this state).
> I then switched all services to the peer server, and disconnected DRBD
> to install a new disk. OK for this part.
> 
> Many hours later, Postgres (which reside over DRBD) tried to access a
> sector beyond the device end : "ERROR:  cannot read block 142 of
> tbl_carac_cle_doc_code_classif_: Input/output error"
> I immediately stopped Postgres, and investigated.
> This happened "many hours later", but I don't know if it's because
> Postgres was not really used until then, or if the actual cause began
> many hours after the crash of the peer's disk.
> 
> It appears that DRBD only sees 629145600 out of the 6291456000 bytes on
> the LVM device (thus exactly ten times less), and refuses to extend to
> the whole space.
> I upgraded DRBD 0.7.5 to 0.7.10 before reporting this problem, just to
> be sure it was not a bug in 0.7.5. LVM is still the same : the one in
> stock kernel 2.4.27.
> 
> This is the result of a "dd if=the_LVM_device_then_the_DRBD_device",
> showing what a standard command reads as actual device size :
> 
> 6291456000 Feb  3 20:10 dev_bases_bases.dd
> 629145600 Feb  3 16:01 dev_drbd1.dd
> 
> LVM says the device is still 6GB :
> 
> [root at malauzat:root]# lvdisplay /dev/bases/bases
> --- Logical volume ---
> LV Name                /dev/bases/bases
> VG Name                bases
> LV Write Access        read/write
> LV Status              available
> LV #                   1
> # open                 2
> LV Size                5.86 GB
> Current LE             1500
> Allocated LE           1500
> Allocation             next free
> Read ahead sectors     1024
> Block device           58:1
> 
> When I setup DRBD, I get nothing ou stdout, but lines in syslog tell me
> that the size of the device is one tenth of its actual size :
> 
> drbd1: resync bitmap: bits=153600 words=4800
> drbd1: size = 600 MB (614400 KB)
> drbd1: 600 MB marked out-of-sync by on disk bit-map.
> drbd1: Found 6 transactions (324 active extents) in activity log.
> drbd1: drbdsetup [8051]: cstate Unconfigured --> StandAlone
> drbd1: drbdsetup [8054]: cstate StandAlone --> Unconnected
> drbd1: drbd1_receiver [8055]: cstate Unconnected --> WFConnection
> 
> When I mount the FS over DRBD, ReiserFS correctly complains about device
> size vs. FS size, because the FS was created and worked with a 6GB device :
> 
> Feb  4 14:43:09 malauzat kernel: reiserfs: found format "3.6" with
> standard journal
> Feb  4 14:43:09 malauzat kernel: Filesystem on 93:01 cannot be mounted
> because it is bigger than the device
> Feb  4 14:43:09 malauzat kernel: You may need to run fsck or increase
> size of your LVM partition
> Feb  4 14:43:09 malauzat kernel: Or may be you forgot to reboot after
> fdisk when it told you to
> 
> When I try to force the size of the DRBD device, it refuses :
> 
> [root at malauzat:root]# /sbin/drbdsetup /dev/drbd1 disk /dev/bases/bases
> /dev/hda10 1 --on-io-error=detach -d 6144000
> 
> drbd1: Requested disk size is too big (6144000 > 614400)
> drbd1: size = 600 MB (614400 KB)
> drbd1: 600 MB marked out-of-sync by on disk bit-map.
> drbd1: Found 6 transactions (324 active extents) in activity log.
> drbd1: drbdsetup [8788]: cstate Unconfigured --> StandAlone
> 
> 
> Thanks for any advice, clues, etc.
>