Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Apr 13, 2010 at 01:51:20PM +0200, Jose Manuel Torralba wrote: > Hi, this is my first post in this list and the question is regarding > an inherited system of which I'm not completely familiar with. If I'm > missing any necessary details, please ask. > > We have a cluster, running drbd 8.2.6, both nodes were affected by a power cut. Any volatile caches enabled? Like, 500 MB of non-battery-backed controller cache? or 7 * 32 MB of volatile on-disk caches? Anything that was in volatile caches when the power was cut is lost. I bet no file system would be particularly happy about that. > OS is CentOS 5.1 64bit, kernel 2.6.18-53.1.19.el5.028stab053.14 > > The problem is that it's not restarting because none of the nodes is > mounting the drbd device with the following error: > > Apr 13 08:34:25 nodo02 Filesystem[29553]: INFO: Running start for /dev/drbd0 on /vz > Apr 13 08:34:25 nodo02 kernel: EXT3-fs error (device drbd0): ext3_check_descriptors: Block bitmap for group 5376 not in group (block 662340004)! > Apr 13 08:34:25 nodo02 kernel: EXT3-fs: group descriptors corrupted! > Apr 13 08:34:25 nodo02 Filesystem[29553]: ERROR: Couldn't mount filesystem /dev/drbd0 on /vz > > We have stopped services in both nodes and, in order to isolate the > problem, we have executed in one of the nodes (both behave the same): > > service drbd start > drbdadm primary r0 > mount /dev/drbd0 /vz > > Last lines of dmesg: > > drbd: initialised. Version: 8.0.7 (api:86/proto:86) > drbd: GIT-hash: cf14288833afe95db396075f8530a5960d29e498 build by phil at mescal, 2007-11-02 13:15:41 > drbd: registered as block device major 147 > drbd: minor_table @ 0xffff8102245d2ec0 > drbd0: disk( Diskless -> Attaching ) > drbd0: Found 6 transactions (324 active extents) in activity log. > drbd0: max_segment_size ( = BIO size ) = 32768 > drbd0: drbd_bm_resize called with capacity == 2929554040 > drbd0: resync bitmap: bits=366194255 words=5721786 > drbd0: size = 1396 GB (1464777020 KB) > drbd0: reading of bitmap took 619 jiffies > drbd0: recounting of set bits took additional 45 jiffies > drbd0: 4 KB marked out-of-sync by on disk bit-map. > drbd0: disk( Attaching -> UpToDate ) > drbd0: Writing meta data super block now. > drbd0: conn( StandAlone -> Unconnected ) > drbd0: receiver (re)started > drbd0: conn( Unconnected -> WFConnection ) > drbd0: role( Secondary -> Primary ) > drbd0: Writing meta data super block now. > EXT3-fs error (device drbd0): ext3_check_descriptors: Block bitmap for group 5376 not in group (block 662340004)! > EXT3-fs: group descriptors corrupted! > > We think the the next step is to run fsck in one node and then force the other node to syncronize. > > A read only run of fsck.ext3 over /dev/drbd0 returns an awful lot of errors. > > The question is if this is the recommended procedure to fsck a drbd device. If you live on LVM, you could snapshot before fsck, just in case. Though a snapshot won't speed up the fsck, of course. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed