[DRBD-user] /dev/drbd0 fsck.ext2 repair not working after uncontrolled outage

Mon Aug 17 11:44:36 CEST 2009

Hi all.  I have a 2 server configuration using drbd and openais. This is
a test/dev environment prior to proposing a production rollout. My drbd
config is below. The problem is that in an "uncontrolled" poweroff of
one of the servers, it will not come back online without user
intervention. I have narrowed this down to drbd apparently failing to
repair the corrupted filesystem (EXT2 which it needs to be for issues
with our proprietary software). When the node is powered back up, I can
do a ps and grep for fsck, and I can see that an fsck has been
initialised for /dev/drbd0. However, there are two issues here. Issue
(1) By the time drbd has finished the fsck process, the openais
controlled services have timed out trying to start, and sit there doing
nothing until I reset the failcounts, and cleanup the resources. Issue
(2) is that the drbd triggered fsck does not actually seem to work as
when I carry out the cleanup process and try to start the services, it
still fails. If I do a manual fsck, it finds filesystem errors and
repairs them, then all is well. 

So, in summary, in my scenario, the drbd initiated fsck goes through the
motions but does not actually fix the filesystem, and secondly, by the
time the filesystem has been (supposedly) repaired, the openais services
have given up trying to start and sit there in a dead state.

Any ideas?

Operating system is SUSE SLES 11 Enterprise with the HA Extension

Versions this installed are:

OpenAIS              0.80.3-26.1

Hearbeat             2.99.3-14.1

DRBD                     8.2.7-6.12

Pacemaker         1.0.3-4.1

DRBD.CONF 

        \/

 global {

  dialog-refresh       1;

  minor-count  5;

}

common {

  syncer { rate 10M; }

}

resource cluster_disk {

  protocol     C;

  disk {

     on-io-error       pass_on;

  }

  syncer {

  }

  net {

     after-sb-1pri discard-secondary;

  }

  startup {

     wait-after-sb;

}

  on cluster1 {

     device    /dev/drbd0;

     address   12.0.0.1:7789;

     meta-disk internal;

     disk      /dev/sdb1;

  }

  on cluster2 {

     device    /dev/drbd0;

     address   12.0.0.2:7789;

     meta-disk internal;

     disk      /dev/sdb1;

  }

}

Gary Webb 
MCSE:S, CompTIA Security+ / Linux+

Citrix CCA, LPI Level 1 & 2, C|EH

Principal Solutions Development Consultant
open gi
Buckholt Drive, Warndon, Worcester WR4 9SR 
Tel:    +44 (0)1905 754455     Ext:   2451
Fax:   +44 (0)1905 754441
Email:  gary.webb at opengi.co.uk <mailto:gary.webb at opengi.co.uk>  
Web:  www.opengi.co.uk <http://www.opengi.co.uk> 

powering broker business for 30 years

Please consider the environment before printing this e-mail 

"This message is intended for the named recipient only and may be
privileged and/or confidential. If you are not the intended or named
recipient or have received this email in error then you should not copy
forward or disclose it to any other persons. If you have received this
email in error you should destroy it and contact the sender so that we
may take appropriate action. The views and opinions expressed in this
email may not represent the views and opinions of Open International
Limited or any of its subsidiaries and are made without prejudice and
subject to contract. The Company Reserves the right to intercept and
review all email communications." 

Open International Limited. Registered Office: Buckholt Drive, Warndon,
Worcester, WR4 9SR. 
Registered in England. Registered No: 05716519

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090817/75339b98/attachment.htm>