[DRBD-user] Managed to corrupt a drbd9 resource, but don't fully understand how
Eddie Chapman
eddie at ehuk.net
Mon Jul 29 11:34:22 CEST 2019
Hello,
I've managed to corrupt one of the drbd9 resources on one of my
production servers, now I'm trying to figure out exactly what happened
so I can try and recover the data. I wonder if anyone might understand
what went wrong here ( apart from the fact that PEBCAK :-) )?
This is a simple two node resource with an ext4 fs directly on top and
an lvm volume as backing device, nothing special.
Gentoo, drbd kernel 9.0.19-1, drbd utilities 9.10.0, vanilla kernel.org
4.19.60
Here's the sequence of events:
1. The resource is online in its normal state, connected, dstate
UpToDate on both nodes. It is Primary/Secondary and I'm working on the
Primary node. I unmount the fs on top of the resource.
2. I run: e2fsck -f /dev/drbd12
which completes without any problems.
3. I run: drbdadm secondary res0
on the Primary.
The cstate is now Secondary/Secondary
4. I run: drbdadm down res0
on both nodes. Now the resource is fully down on both nodes.
5. I destroy the backing device on the node which is normally Secondary
(wish I hadn't now but it was a calculated risk that was part of my
original plan - heh). Now only one node exists.
6. I run: drbdadm up res0
on the remaining node. The resource comes up fine, it is now Secondary,
UpToDate but disconnected. The fs on top is NOT mounted.
(Oops, later I realise I forgot to make the resource Primary, so it
stays Secondary. But not sure if this is a factor at all in what follows.)
7. I extend the resource's backing device:
lvextend -l +11149 /dev/vg02/res0
Size of logical volume vg02/res0 changed from 120.00 GiB (30720
extents) to 163.55 GiB (41869 extents).
Logical volume vg02/res0 successfully resized.
Oops, I made a mistake in my lvextend, I wanted to extend using extents
from a specific PV but forgot to add the PV and extents range on the end
of the lvextend command. No problem, I know the PV segments and their
extent ranges from before the extend as I ran lvdisplay -m before I did
anything. So I can simply delete and re-create the LV using the exact
same segments and extent ranges to restore it to how it was (this is
much easier than trying to remove specific PV segments from a LV which
lvreduce/lvresize cannot do AFAICT, and an approach I've used
successfully many times before). However I can't run lvremove on the
backing device while it is still held open by drbd so ...
8. I run: drbdadm down res0
which completes successfully.
Later I look back at the kernel log and see that when I downed the
resource in this step, the log is normal and as you would expect, apart
from this 1 line:
drbd xtra3/0 drbd12: ASSERTION drbd_md_ss(device->ldev) ==
device->ldev->md.md_offset FAILED in drbd_md_write
A very obvious sign something is very wrong at this point, but of course
I don't see that until later when doing my post mortem :-)
Apart from that 1 line all other lines logged when downing the device
are normal. I've pasted the full log at the bottom of this mail.
9. With the resource down, I now am able to successfully delete and
re-create the backing device. These are my commands:
lvremove /dev/vg02/res0
Do you really want to remove active logical volume vg02/res0? [y/n]: y
Logical volume "res0" successfully removed
lvcreate -l 30720 -n res0 vg02 /dev/sdf1:36296-59597
/dev/sdf1:7797-12998 /dev/sdf1:27207-29422
WARNING: ext4 signature detected on /dev/vg02/res0 at offset 1080. Wipe
it? [y/n]: n
Aborted wiping of ext4.
1 existing signature left on the device.
Logical volume "res0" created.
I run lvdisplay -m to confirm the list of segments and extent ranges is
exactly as it was before I extended the LV in step 7, which it is.
Wonderful, I continue under my delusion that this is all going fine so
far :-)
10. I run: drbdadm up res0
and then:
drbdadm primary res0
which both complete successfully. I've pasted the full kernel log from
these 2 commands at the end in full.
11. I run: drbdadm primary res0
which completes successfully. All looks fine to me at this point.
12. I decide to run a forced e2fsck on the resource to double check all
is OK. Here is now the first time I realise something has gone badly wrong:
e2fsck -f /dev/drbd12
e2fsck 1.45.3 (14-Jul-2019)
ext2fs_open2: Bad magic number in super-block
e2fsck: Superblock invalid, trying backup blocks...
Pass 1: Checking inodes, blocks, and sizes
Inode 262145 seems to contain garbage. Clear<y>?
/dev/drbd12: e2fsck canceled.
/dev/drbd12: ***** FILE SYSTEM WAS MODIFIED *****
I Ctrl-C as soon as it asked me to clear that inode, but too late fs has
been modified in some way.
I run dumpe2fs which dumps a full set of info on the fs which all looks
very normal apart from the line:
Filesystem state: not clean
13. I run e2fsck again this time with -n (which I should have done first
time round silly me). I starts out:
e2fsck -f -n /dev/drbd12
e2fsck 1.45.3 (14-Jul-2019)
Pass 1: Checking inodes, blocks, and sizes
Inode 262145 seems to contain garbage. Clear? no
Inode 262146 passes checks, but checksum does not match inode. Fix? no
<...repeat same line for hundreds of inodes...>
So, obviously I made a big mistake when I extended the backing device
(while it was Secondary though not sure if that is a factor) and then
immediately brought down the resource without following through and
running "drbdadm resize", and the drbd ASSERTION shows that.
However, at the time I wasn't expecting any problems because I had not
mounted the resource. So, in my mind, as I had not made any changes to
any filesystem data, I thought that would be fine.
Obviously, what happened is the backing device *size* changed while the
resource was live. This should be fine in itself as it is what normally
happens in a drbd online resize.
However, by then bringing down the resource, looks like drbd has
probably tried to write its metadata at the end of the backing device
and this fails because of the size change?
Ultimately all my fault of course for not following through the online
resize.
However, now I want to try and recover my fs and I just want to
understand what exactly happened with the metadata when the resource was
brought down and there was the ASSERTION?
I want to understand that before I try and run e2fsck without -n, which
I've seen is going to make drastic changes to the fs, which may recover
the fs, but may also make things worse at this point.
Perhaps I should just wipe-md and try and access the fs directly on the
backing device.
By the way, I do take a daily backup of the data on almost all my other
drbd resources. Just not this one unfortunately, the one I happen to
mess up ... :-)
Thanks in advance,
Eddie
==== Full kernel log from Step 8 =====
Jul 28 18:18:52 node1 kernel: [384749.421172] drbd res0 tv: conn(
Connecting -> Disconnecting )
Jul 28 18:18:52 node1 kernel: [384749.421217] drbd res0 tv: Restarting
sender thread
Jul 28 18:18:52 node1 kernel: [384749.421230] drbd res0/0 drbd12:
ASSERTION drbd_md_ss(device->ldev) == device->ldev->md.md_offset FAILED
in drbd_md_write
Jul 28 18:18:52 node1 kernel: [384749.421309] drbd res0 tv: Connection
closed
Jul 28 18:18:52 node1 kernel: [384749.421318] drbd res0 tv: conn(
Disconnecting -> StandAlone )
Jul 28 18:18:52 node1 kernel: [384749.421326] drbd res0 tv: Terminating
receiver thread
Jul 28 18:18:52 node1 kernel: [384749.421341] drbd res0 tv: Terminating
sender thread
Jul 28 18:18:52 node1 kernel: [384749.421435] drbd res0/0 drbd12: disk(
UpToDate -> Detaching )
Jul 28 18:18:52 node1 kernel: [384749.421497] drbd res0/0 drbd12: disk(
Detaching -> Diskless )
Jul 28 18:18:52 node1 kernel: [384749.421908] drbd res0/0 drbd12:
drbd_bm_resize called with capacity == 0
Jul 28 18:18:52 node1 kernel: [384749.454400] drbd res0: Terminating
worker thread
==== Full kernel log from Steps 10 and 11 =====
Jul 28 18:24:51 node1 kernel: [385108.607668] drbd res0: Starting worker
thread (from drbdsetup [26084])
Jul 28 18:24:51 node1 kernel: [385108.610730] drbd res0 tv: Starting
sender thread (from drbdsetup [26088])
Jul 28 18:24:51 node1 kernel: [385108.790058] drbd res0/0 drbd12:
meta-data IO uses: blk-bio
Jul 28 18:24:51 node1 kernel: [385108.790213] drbd res0/0 drbd12: disk(
Diskless -> Attaching )
Jul 28 18:24:51 node1 kernel: [385108.790220] drbd res0/0 drbd12:
Maximum number of peer devices = 1
Jul 28 18:24:51 node1 kernel: [385108.790623] drbd res0: Method to
ensure write ordering: drain
Jul 28 18:24:51 node1 kernel: [385108.790633] drbd res0/0 drbd12:
drbd_bm_resize called with capacity == 251650488
Jul 28 18:24:51 node1 kernel: [385108.793611] drbd res0/0 drbd12: resync
bitmap: bits=31456311 words=491505 pages=960
Jul 28 18:24:51 node1 kernel: [385108.793616] drbd res0/0 drbd12: size =
120 GB (125825244 KB)
Jul 28 18:24:51 node1 kernel: [385108.793617] drbd res0/0 drbd12: size =
120 GB (125825244 KB)
Jul 28 18:24:51 node1 kernel: [385108.830061] drbd res0/0 drbd12:
recounting of set bits took additional 0ms
Jul 28 18:24:51 node1 kernel: [385108.830072] drbd res0/0 drbd12: disk(
Attaching -> UpToDate )
Jul 28 18:24:51 node1 kernel: [385108.830073] drbd res0/0 drbd12:
attached to current UUID: 51A43103091A9E70
Jul 28 18:24:51 node1 kernel: [385108.831297] drbd res0 tv: conn(
StandAlone -> Unconnected )
Jul 28 18:24:51 node1 kernel: [385108.831394] drbd res0 tv: Starting
receiver thread (from drbd_w_res0 [26085])
Jul 28 18:24:51 node1 kernel: [385108.831455] drbd res0 tv: conn(
Unconnected -> Connecting )
Jul 28 18:25:54 node1 kernel: [385172.119565] drbd res0: Preparing
cluster-wide state change 840522528 (0->-1 3/1)
Jul 28 18:25:54 node1 kernel: [385172.119569] drbd res0: Committing
cluster-wide state change 840522528 (0ms)
Jul 28 18:25:54 node1 kernel: [385172.119572] drbd res0: role( Secondary
-> Primary )
Jul 28 18:25:59 node1 kernel: [385176.861534] drbd res0/0 drbd12: new
current UUID: 8BE37DA3CE798447 weak: FFFFFFFFFFFFFFFE
More information about the drbd-user
mailing list