Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
it is not drbd that crashes with anything here, but the filesystem, because you screwed up and effectively truncated it. On Fri, Feb 08, 2008 at 10:19:49AM -0500, Doug Knight wrote: > I'm really getting desparate on this, as we are currently not in a high > availability state with our server, so I thought I'd include some more > info. Attached is my drbd.conf. Also, I am running RHEL5 > 2.6.18-8.1.14.el5 on both systems. Below is a capture from my system > messages log from the original failure: if I understand correctly, what you did is 1) have some partition, with drbd and internal meta data on it, and a happy file system on drbd. 2) stop drbd (first get it into Connected, Secondary/Secondary) 3) use parted to resize the partition 3.1) which also resized the file system on that partition 4) created new internal drbd meta data 5) started drbd again 6) tried to use the now file system on drbd, which fails if that was indeed what was happening, you screwed up in 3.1, or latest with 4). see below. if that description does not at all match what you did, please ignore the rest and describe yourself exactly what you did. > > Hi list, > > I had one of my HA systems, running drbd 8.0.1, issue an error on its and we are some versions ahead of 8.0.1, so please upgrade. > > drbd0 device (see title). We recently resized the underlying partition > > using gparted to include the partition immediately following it > > (verified that the new, larger partitions were identical, and ran the > > command to fix the meta-data, suggested when drbd was restarted). We > > did this on both systems, and everything seemed OK for a few days. > > This morning we got the error, heartbeat detected it, and migrated > > resources to the other system, no problem. I took drbd down on both > > systems, mounted and set primary drbd0 on the system with the issue, > > and did an fsck -fvn /dev/drbd0 on it (unmounted). I get the > > following: > > > > The filesystem size (according to the superblock) is 29288495 blocks > > The physical size of the device is 29287592 blocks > > Either the superblock or the partition table is likely to be corrupt! the fs resize in 3.1, not knowing that you needed to keep some unused MB for the drbd meta data at the end of the device, resized the fs to use up the full partition. when creating the new internal meta data in 4), it used that last some MB, and when you now up drbd, the "drbd partition" is exactly that some MB smaller than the lower level "real partition". what you should have done is either: in 4) use DRBD external meta data, which would have worked just fine. or: 1,2,3 BUT NOT 3.1 (regardless of wether parted did the fs resize, or you did it your self) repeat 1,2,3 on the other node (still _without_ the fs resize), create the new insternal drbd meta data on both nodes connect the drbds chose one (preferably the one that had been primary last) make that primary again, using the "overwrite data of peer" thing. wait for the resync to happen. now you still have the file system in the old size. but you can verify that your DRBD is indeed the new size (minus whatever drbd needs for its internal meta data). so, after you verified that, you do the file system resize on the _drbd_ NOT on the lower level partition. > > So, I then ran fsck without the -n to correct. Now, drbd seems to be > > completely hosed up. If I do a ./drbd start, the system locks up. If I > > do the drbdadm adjust pgsql, it locks up the system too. I went as far > > as to shutdown drbd, remove the kernel module, delete the sda5 > > partition and recreate it, starting over, and it still locks up the > > system when I try to bring up drbd. What I'd like to do is fix the > > issue on this system, and let it get back in sync with the other > > system. So 1) How do I get drbd back and functioning on the system > > where the issue occurred?, and 2) Do I need to do anything to the > > system that is currently running OK (due to the partition resize, > > etc)? possible way out: stop drbd completely. fsck /dev/sd-whatever-it-is resize2fs /dev/sd-whatever-it-is THE_SMALLER_SIZE_which_is_the_real_size_of_the_drbd hope for the best start drbd again, mount drbd compare with backups. or umount /dev/drbd mkfs /dev/drbd restore from backup. -- : Lars Ellenberg http://www.linbit.com : : DRBD/HA support and consulting sales at linbit.com : : LINBIT Information Technologies GmbH Tel +43-1-8178292-0 : : Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 : __ please use the "List-Reply" function of your email client.