[DRBD-user] Recommended procedure to fsck a drbd device

Wed Apr 14 10:45:24 CEST 2010

On 04/13/2010 01:51 PM, Jose Manuel Torralba wrote:
> Hi, this is my first post in this list and the question is regarding an inherited system of which I'm not completely familiar with. If I'm missing any necessary details, please ask.
> 
> We have a cluster, running drbd 8.2.6, both nodes were affected by a power cut. OS is CentOS 5.1 64bit, kernel 2.6.18-53.1.19.el5.028stab053.14
> 
> The problem is that it's not restarting because none of the nodes is mounting the drbd device with the following error:
> 
> Apr 13 08:34:25 nodo02 Filesystem[29553]: INFO: Running start for /dev/drbd0 on /vz
> Apr 13 08:34:25 nodo02 kernel: EXT3-fs error (device drbd0): ext3_check_descriptors: Block bitmap for group 5376 not in group (block 662340004)!
> Apr 13 08:34:25 nodo02 kernel: EXT3-fs: group descriptors corrupted!
> Apr 13 08:34:25 nodo02 Filesystem[29553]: ERROR: Couldn't mount filesystem /dev/drbd0 on /vz
> 
> We have stopped services in both nodes and, in order to isolate the problem, we have executed in one of the nodes (both behave the same):
> 
> service drbd start
> drbdadm primary r0
> mount /dev/drbd0 /vz

Well, by this, you will probably have caused a drbd split brain situation.

I'd suggest recovering from that first.

> Last lines of dmesg:
> 
> drbd: initialised. Version: 8.0.7 (api:86/proto:86)
> drbd: GIT-hash: cf14288833afe95db396075f8530a5960d29e498 build by phil at mescal, 2007-11-02 13:15:41
> drbd: registered as block device major 147
> drbd: minor_table @ 0xffff8102245d2ec0
> drbd0: disk( Diskless -> Attaching )
> drbd0: Found 6 transactions (324 active extents) in activity log.
> drbd0: max_segment_size ( = BIO size ) = 32768
> drbd0: drbd_bm_resize called with capacity == 2929554040
> drbd0: resync bitmap: bits=366194255 words=5721786
> drbd0: size = 1396 GB (1464777020 KB)
> drbd0: reading of bitmap took 619 jiffies
> drbd0: recounting of set bits took additional 45 jiffies
> drbd0: 4 KB marked out-of-sync by on disk bit-map.
> drbd0: disk( Attaching -> UpToDate )
> drbd0: Writing meta data super block now.
> drbd0: conn( StandAlone -> Unconnected )
> drbd0: receiver (re)started
> drbd0: conn( Unconnected -> WFConnection )
> drbd0: role( Secondary -> Primary )
> drbd0: Writing meta data super block now.
> EXT3-fs error (device drbd0): ext3_check_descriptors: Block bitmap for group 5376 not in group (block 662340004)!
> EXT3-fs: group descriptors corrupted!
> 
> We think the the next step is to run fsck in one node and then force the other node to syncronize.

No need to force anything. Once you recovered from your self-produced
split-brain situation (see the users-guide on how to do that), you
should have the nodes in sync. Then just promote one side and run fsck.

Everything fsck does will automatically be replicated to the other node
and - say fsck gets things right - both sides will be fine.

Regards
Dominik

> A read only run of fsck.ext3 over /dev/drbd0 returns an awful lot of errors.
> 
> The question is if this is the recommended procedure to fsck a drbd device.
> 
> Thanks
> 
> 
> José Torralba
> Mercury Internet
> 
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 

-- 
IN-telegence GmbH & Co. KG
Oskar-Jäger-Str. 125
50825 Köln

Registergericht Köln - HRA 14064, USt-ID Nr. DE 194 156 373
ph Gesellschafter: komware Unternehmensverwaltungsgesellschaft mbH,
Registergericht Köln - HRB 38396
Geschäftsführende Gesellschafter: Christian Plätke und Holger Jansen