[DRBD-user] DRBD 0.7.23 and MD corruption

Fri May 18 19:02:13 CEST 2007

Dan,

I think the person whose post I saw on the net was named "Gerry Reno".  
Nonetheless, I used the "resize_reiserfs" tool to shrink the filesystem 
to allow for metadata internal to the disk itself.  I've had success 
with this in the past, with older versions of DRBD (I have a production 
system in place for which I've done this) and older kernels.

The disks are actually mostly empty at the moment - they are going to be 
production boxes at some point in the future, but right now they're 
still in a testing environment.  I haven't actually tried to sync 
anything up except a test file created with "touch foo" - definitely not 
183 MB.

Here are the steps, as best as I can restate them:

I first resized the filesystem to allow for DRBD's internal metadata.  
It says it only requires 128MB, but I have plenty of disk space to spare 
so I gave it double what the docs recommend:

resize_reiserfs -s -256M /dev/md7

In my sources.list:

deb-src http://http.us.debian.org/debian sid main contrib non-free

I then built the module against my kernel:

apt-get build-dep drbd0.7-module-source
apt-get -b source drbd0.7-module-source

Once I had the .debs that created (it creates the packages for the utils 
and the module source), I built the module:

m-a a-b drbd0.7-module --kernel-dir=/usr/local/src/linux-2.6.18.8

Then, I installed the resulting .deb package, and loaded the module, 
confirming it was loaded:

modprobe drbd; lsmod

I then installed the standard Etch heartbeat2 package and configured 
it.  Here is my drbd.conf and relevant heartbeat configs:

### /etc/drbd.conf ###
resource r0 {
  protocol      C;
  incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; 
halt -f";
  startup { wfc-timeout 0; degr-wfc-timeout     120; }
  disk { on-io-error detach; }
  syncer {
    rate        60M;
    group       1;   # sync when r2 is finished syncing.
  }
  on db1 {
    device      /dev/drbd0;
    disk        /dev/md7;
    address     192.168.101.26:7791;
    meta-disk   internal;
  }
  on db2 {
    device      /dev/drbd0;
    disk        /dev/md7;
    address     192.168.101.27:7791;
    meta-disk   internal;
  }
}

### /etc/ha.d/haresources ###
db1    drbddisk::r0 Filesystem::/dev/drbd0::/home::reiserfs \
        192.168.101.25 mysql postgresql-8.1 \
        MailTo::sysadmin at our-domain.com::LVS-State_Change

### /etc/ha.d/ha.cf ###
logfacility daemon         # Log to syslog as facility "daemon"
node db1 db2             # List our cluster members
keepalive 1                # Send one heartbeat each second
deadtime 10                # Declare nodes dead after 10 seconds
bcast eth0 eth1            # Broadcast heartbeats on eth0 and eth1 
interfaces
ping 192.168.101.1         # Ping our router to monitor ethernet 
connectivity
auto_failback no           # Don't fail back to paul automatically
respawn hacluster /usr/lib/heartbeat/ipfail  # Failover on network failures

If you see any error in anything I've done, let me know, but for 
reference, here is the post from Gerry Reno:

*Author: *Gerry Reno
*Date: * 2007-04-27 23:54  -400
*To: *drbd-user
*Subject: *[DRBD-user] drbd 0.7.23: md device corruption
I am experiencing md device (raid1) corruption when using drbd 0.7.23.
At times the whole machine hangs and a hard reboot is the only way to
recover. Once rebooted the raid array that holds the root drive goes
into full resync. This has happened on several machines. If I stop
drbd I do not have the problem. In a previous thread there was
discussion of a fix in 2.6.20 kernel. My problem is that I cannot
upgrade to this kernel. Has this fix been backported at all? I am
using FC6 with kernel 2.6.19-1.2985.fc6xen. Or is there any type of
workaround?

Dan Gahlinger wrote:
> Are you referring to the issues I had?
> This is a common problem if you try to setup DRBD on an existing 
> partition, and don't take into consideration the meta-disk
>
> When we ran this, it would "look" ok, but when trying to copy a file 
> of a certain size it would corrupt.
> It turned out the magic number was around 183 megs for us.
>
> The key to that was to build the partition on the DRBD device which is 
> the way it's supposed to be installed.
>
> If you can let us know the exact steps you use to setup the drbd 
> devce, the partition, and our configuration file,
> someone can probably help you.
>
> Dan.
>
> On 5/18/07, *Ryan Steele* <steele at agora-net.com 
> <mailto:steele at agora-net.com>> wrote:
>
>     I saw someone else post something similar to this a few weeks ago, but
>     didn't see any response to it.  I've just set up DRBD 0.7.23 with
>     Heartbeat2 on two future database server.  However, DRBD seems to have
>     corrupted my multi-disk RAID1.  I booted a Knoppix CD on the affected
>     machines, removed the DRBD rc.d scripts, and rebooted and things were
>     fine.  To verify, I ran update-rc.d to recreate the symbolic
>     links, and
>     rebooted again to find that it again would not boot.  Moreover, even
>     removing the rc.d links did not help - the array is, I fear,
>     irreparably
>     damaged.
>
>     Is there any acknowledgement of this bug, or are there any suggestions
>     as to how one might go about fixing it?  I can't even boot into the
>     machine to run mdadm and repair the array, though maybe I can do that
>     from the Knoppix CD...
>
>     In any case, I just wanted to make people aware, and hopefully get a
>     little feedback.  Thanks.
>
>
>     --
>     Ryan Steele
>     Systems Administrator
>
>
>     _______________________________________________
>     drbd-user mailing list
>     drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
>     http://lists.linbit.com/mailman/listinfo/drbd-user
>
>

-- 
Ryan Steele
Systems Administrator               steele at agora-net.com
AgoraNet, Inc.			    	  (302) 224-2475
314 E. Main Street, Suite 1         (302) 224-2552 (fax)
Newark, DE 19711                    http://www.agora-net.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070518/e32ec712/attachment.htm>