[DRBD-user] Managed to corrupt a drbd9 resource, but don't fully understand how

Mon Jul 29 11:34:22 CEST 2019

Hello,

I've managed to corrupt one of the drbd9 resources on one of my 
production servers, now I'm trying to figure out exactly what happened 
so I can try and recover the data. I wonder if anyone might understand 
what went wrong here ( apart from the fact that PEBCAK :-) )?

This is a simple two node resource with an ext4 fs directly on top and 
an lvm volume as backing device, nothing special.

Gentoo, drbd kernel 9.0.19-1, drbd utilities 9.10.0, vanilla kernel.org 
4.19.60

Here's the sequence of events:

1. The resource is online in its normal state, connected, dstate 
UpToDate on both nodes. It is Primary/Secondary and I'm working on the 
Primary node. I unmount the fs on top of the resource.

2. I run: e2fsck -f /dev/drbd12
which completes without any problems.

3. I run: drbdadm secondary res0
on the Primary.
The cstate is now Secondary/Secondary

4. I run: drbdadm down res0
on both nodes. Now the resource is fully down on both nodes.

5. I destroy the backing device on the node which is normally Secondary 
(wish I hadn't now but it was a calculated risk that was part of my 
original plan - heh). Now only one node exists.

6. I run: drbdadm up res0
on the remaining node. The resource comes up fine, it is now Secondary, 
UpToDate but disconnected. The fs on top is NOT mounted.

(Oops, later I realise I forgot to make the resource Primary, so it 
stays Secondary. But not sure if this is a factor at all in what follows.)

7. I extend the resource's backing device:

lvextend -l +11149 /dev/vg02/res0
   Size of logical volume vg02/res0 changed from 120.00 GiB (30720 
extents) to 163.55 GiB (41869 extents).
   Logical volume vg02/res0 successfully resized.

Oops, I made a mistake in my lvextend, I wanted to extend using extents 
from a specific PV but forgot to add the PV and extents range on the end 
of the lvextend command. No problem, I know the PV segments and their 
extent ranges from before the extend as I ran lvdisplay -m before I did 
anything. So I can simply delete and re-create the LV using the exact 
same segments and extent ranges to restore it to how it was (this is 
much easier than trying to remove specific PV segments from a LV which 
lvreduce/lvresize cannot do AFAICT, and an approach I've used 
successfully many times before). However I can't run lvremove on the 
backing device while it is still held open by drbd so ...

8. I run: drbdadm down res0
which completes successfully.

Later I look back at the kernel log and see that when I downed the 
resource in this step, the log is normal and as you would expect, apart 
from this 1 line:

drbd xtra3/0 drbd12: ASSERTION drbd_md_ss(device->ldev) == 
device->ldev->md.md_offset FAILED in drbd_md_write

A very obvious sign something is very wrong at this point, but of course 
I don't see that until later when doing my post mortem :-)

Apart from that 1 line all other lines logged when downing the device 
are normal. I've pasted the full log at the bottom of this mail.

9. With the resource down, I now am able to successfully delete and 
re-create the backing device. These are my commands:

lvremove /dev/vg02/res0
Do you really want to remove active logical volume vg02/res0? [y/n]: y
   Logical volume "res0" successfully removed

lvcreate -l 30720 -n res0 vg02 /dev/sdf1:36296-59597 
/dev/sdf1:7797-12998 /dev/sdf1:27207-29422
WARNING: ext4 signature detected on /dev/vg02/res0 at offset 1080. Wipe 
it? [y/n]: n
   Aborted wiping of ext4.
   1 existing signature left on the device.
   Logical volume "res0" created.

I run lvdisplay -m to confirm the list of segments and extent ranges is 
exactly as it was before I extended the LV in step 7, which it is.

Wonderful, I continue under my delusion that this is all going fine so 
far :-)

10. I run: drbdadm up res0
and then:
drbdadm primary res0

which both complete successfully. I've pasted the full kernel log from 
these 2 commands at the end in full.

11. I run: drbdadm primary res0
which completes successfully. All looks fine to me at this point.

12. I decide to run a forced e2fsck on the resource to double check all 
is OK. Here is now the first time I realise something has gone badly wrong:

e2fsck -f /dev/drbd12
e2fsck 1.45.3 (14-Jul-2019)
ext2fs_open2: Bad magic number in super-block
e2fsck: Superblock invalid, trying backup blocks...
Pass 1: Checking inodes, blocks, and sizes
Inode 262145 seems to contain garbage.  Clear<y>?
/dev/drbd12: e2fsck canceled.

/dev/drbd12: ***** FILE SYSTEM WAS MODIFIED *****

I Ctrl-C as soon as it asked me to clear that inode, but too late fs has 
been modified in some way.

I run dumpe2fs which dumps a full set of info on the fs which all looks 
very normal apart from the line:

Filesystem state:         not clean

13. I run e2fsck again this time with -n (which I should have done first 
time round silly me). I starts out:

e2fsck -f -n /dev/drbd12
e2fsck 1.45.3 (14-Jul-2019)
Pass 1: Checking inodes, blocks, and sizes
Inode 262145 seems to contain garbage.  Clear? no

Inode 262146 passes checks, but checksum does not match inode.  Fix? no
<...repeat same line for hundreds of inodes...>

So, obviously I made a big mistake when I extended the backing device 
(while it was Secondary though not sure if that is a factor) and then 
immediately brought down the resource without following through and 
running "drbdadm resize", and the drbd ASSERTION shows that.

However, at the time I wasn't expecting any problems because I had not 
mounted the resource. So, in my mind, as I had not made any changes to 
any filesystem data, I thought that would be fine.

Obviously, what happened is the backing device *size* changed while the 
resource was live. This should be fine in itself as it is what normally 
happens in a drbd online resize.

However, by then bringing down the resource, looks like drbd has 
probably tried to write its metadata at the end of the backing device 
and this fails because of the size change?

Ultimately all my fault of course for not following through the online 
resize.

However, now I want to try and recover my fs and I just want to 
understand what exactly happened with the metadata when the resource was 
brought down and there was the ASSERTION?

I want to understand that before I try and run e2fsck without -n, which 
I've seen is going to make drastic changes to the fs, which may recover 
the fs, but may also make things worse at this point.

Perhaps I should just wipe-md and try and access the fs directly on the 
backing device.

By the way, I do take a daily backup of the data on almost all my other 
drbd resources. Just not this one unfortunately, the one I happen to 
mess up ... :-)

Thanks in advance,
Eddie

==== Full kernel log from Step 8 =====

Jul 28 18:18:52 node1 kernel: [384749.421172] drbd res0 tv: conn( 
Connecting -> Disconnecting )
Jul 28 18:18:52 node1 kernel: [384749.421217] drbd res0 tv: Restarting 
sender thread
Jul 28 18:18:52 node1 kernel: [384749.421230] drbd res0/0 drbd12: 
ASSERTION drbd_md_ss(device->ldev) == device->ldev->md.md_offset FAILED 
in drbd_md_write
Jul 28 18:18:52 node1 kernel: [384749.421309] drbd res0 tv: Connection 
closed
Jul 28 18:18:52 node1 kernel: [384749.421318] drbd res0 tv: conn( 
Disconnecting -> StandAlone )
Jul 28 18:18:52 node1 kernel: [384749.421326] drbd res0 tv: Terminating 
receiver thread
Jul 28 18:18:52 node1 kernel: [384749.421341] drbd res0 tv: Terminating 
sender thread
Jul 28 18:18:52 node1 kernel: [384749.421435] drbd res0/0 drbd12: disk( 
UpToDate -> Detaching )
Jul 28 18:18:52 node1 kernel: [384749.421497] drbd res0/0 drbd12: disk( 
Detaching -> Diskless )
Jul 28 18:18:52 node1 kernel: [384749.421908] drbd res0/0 drbd12: 
drbd_bm_resize called with capacity == 0
Jul 28 18:18:52 node1 kernel: [384749.454400] drbd res0: Terminating 
worker thread

==== Full kernel log from Steps 10 and 11 =====

Jul 28 18:24:51 node1 kernel: [385108.607668] drbd res0: Starting worker 
thread (from drbdsetup [26084])
Jul 28 18:24:51 node1 kernel: [385108.610730] drbd res0 tv: Starting 
sender thread (from drbdsetup [26088])
Jul 28 18:24:51 node1 kernel: [385108.790058] drbd res0/0 drbd12: 
meta-data IO uses: blk-bio
Jul 28 18:24:51 node1 kernel: [385108.790213] drbd res0/0 drbd12: disk( 
Diskless -> Attaching )
Jul 28 18:24:51 node1 kernel: [385108.790220] drbd res0/0 drbd12: 
Maximum number of peer devices = 1
Jul 28 18:24:51 node1 kernel: [385108.790623] drbd res0: Method to 
ensure write ordering: drain
Jul 28 18:24:51 node1 kernel: [385108.790633] drbd res0/0 drbd12: 
drbd_bm_resize called with capacity == 251650488
Jul 28 18:24:51 node1 kernel: [385108.793611] drbd res0/0 drbd12: resync 
bitmap: bits=31456311 words=491505 pages=960
Jul 28 18:24:51 node1 kernel: [385108.793616] drbd res0/0 drbd12: size = 
120 GB (125825244 KB)
Jul 28 18:24:51 node1 kernel: [385108.793617] drbd res0/0 drbd12: size = 
120 GB (125825244 KB)
Jul 28 18:24:51 node1 kernel: [385108.830061] drbd res0/0 drbd12: 
recounting of set bits took additional 0ms
Jul 28 18:24:51 node1 kernel: [385108.830072] drbd res0/0 drbd12: disk( 
Attaching -> UpToDate )
Jul 28 18:24:51 node1 kernel: [385108.830073] drbd res0/0 drbd12: 
attached to current UUID: 51A43103091A9E70
Jul 28 18:24:51 node1 kernel: [385108.831297] drbd res0 tv: conn( 
StandAlone -> Unconnected )
Jul 28 18:24:51 node1 kernel: [385108.831394] drbd res0 tv: Starting 
receiver thread (from drbd_w_res0 [26085])
Jul 28 18:24:51 node1 kernel: [385108.831455] drbd res0 tv: conn( 
Unconnected -> Connecting )
Jul 28 18:25:54 node1 kernel: [385172.119565] drbd res0: Preparing 
cluster-wide state change 840522528 (0->-1 3/1)
Jul 28 18:25:54 node1 kernel: [385172.119569] drbd res0: Committing 
cluster-wide state change 840522528 (0ms)
Jul 28 18:25:54 node1 kernel: [385172.119572] drbd res0: role( Secondary 
-> Primary )
Jul 28 18:25:59 node1 kernel: [385176.861534] drbd res0/0 drbd12: new 
current UUID: 8BE37DA3CE798447 weak: FFFFFFFFFFFFFFFE