[DRBD-user] Problems with applications on top of DRBD locking up, becoming defunct

Lavender, Ben ben.lavender at mcdean-europe.com
Wed Aug 29 19:30:51 CEST 2007


I'm working on getting a few DRBD servers up and running, but I'm having
a ton of trouble with it.  I've set up DRBD/heartbeat on some smaller
servers before with no problems, but this current batch is behaving
strangely with no error messages reported anywhere.

 

Basically, any application (so far pound, nfsd, mysql, and ldap, along
with a number of filesystem tools) will eventually lock up while working
with the mounted drbd volume.  It can happen with something as simple as
'ls', or 'touch'.  After whatever process locks up, the drbd volume is
considered busy and can't be released without fencing it.  After a
reboot, this happens in less than 3 hours.  Killing whatever process is
having the problem will do nothing without -9; with -9 they will
sometimes become defunct and sometimes do nothing.

 

/proc/drbd shows nothing unusual.  This is an example after my second
two drbd volumes have locked up, but I managed to unmount and secondary
one of them.  The second one is locked with mysql and the third with
umount.

 

[root at backend1 ben]# cat /proc/drbd

version: 8.0.5 (api:86/proto:86)

SVN Revision: 3011 build by ben at frontend2.scholar, 2007-08-24 11:35:47

 0: cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate C r---

    ns:61448 nr:0 dw:8 dr:61533 al:0 bm:27 lo:0 pe:0 ua:0 ap:0

        resync: used:0/31 hits:3831 misses:9 starving:0 dirty:0
changed:9

        act_log: used:0/907 hits:2 misses:0 starving:0 dirty:0 changed:0

 1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---

    ns:201216 nr:0 dw:512 dr:203845 al:1 bm:44 lo:0 pe:2 ua:0 ap:2

        resync: used:0/31 hits:12522 misses:22 starving:0 dirty:0
changed:22

        act_log: used:1/907 hits:127 misses:1 starving:0 dirty:0
changed:1

 2: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---

    ns:352612 nr:0 dw:356 dr:672073 al:0 bm:48 lo:0 pe:1 ua:0 ap:1

        resync: used:0/31 hits:21992 misses:24 starving:0 dirty:0
changed:24

        act_log: used:1/907 hits:89 misses:0 starving:0 dirty:0
changed:0

 

I've got no unusual log messages to show; no SELinux messages, no bus io
errors, no failed drives.  If anyone can help I'd appreciate it; I'm
about to toss out DRBD in a situation where it would be ideal.  

 

These are the same machines that prompted my 'WFBitMapT and WFBitMapS'
message from earlier this week which has not seen a response, namely,
Dell 2950's with hardware RAID (Perc5's on the megaraid driver), x86_64,
RHEL5, SELinux enabled.  The DRBD volumes live on top of LVM partitions.
Kernel is 2.6.18-8.1.8, DRBD is 8.0.5, all mentioned software is either
the CentOS5 or RHEL5 version of the particular package.  

 

My more complicated drbd.conf follows, but I have experienced this on a
machine with only one drbd volume and otherwise similar settings.

 

Thanks,

Ben

 

common {

 

  net {

        after-sb-0pri discard-least-changes;

        after-sb-1pri call-pri-lost-after-sb;

        after-sb-2pri disconnect;

  }

  startup {

        wfc-timeout 120;

        degr-wfc-timeout 120;

  }

  handlers {

        pri-lost-after-sb "reboot";

  }

  syncer {

    rate 400M;

 

    al-extents 907;

  }

 

}

resource drbd-ldap {

 

  protocol C;

 

  on backend1.scholar {

    device     /dev/drbd0;

    disk       /dev/mapper/VolGroup00-LogVol02;

    address    192.168.65.237:7788;

    flexible-meta-disk  internal;

  }

 

  on backend2.scholar {

    device    /dev/drbd0;

    disk      /dev/mapper/VolGroup00-LogVol02;

    address   192.168.65.238:7788;

    meta-disk internal;

  }

 

}

 

resource drbd-mysql {

 

  protocol C;

 

  on backend1.scholar {

    device     /dev/drbd1;

    disk       /dev/mapper/VolGroup00-LogVol03;

    address    192.168.65.237:7789;

    flexible-meta-disk  internal;

  }

 

  on backend2.scholar {

    device    /dev/drbd1;

    disk      /dev/mapper/VolGroup00-LogVol03;

    address   192.168.65.238:7789;

    meta-disk internal;

  }

 

}

 

resource drbd-nfs {

 

  protocol C;

 

  on backend1.scholar {

    device     /dev/drbd2;

    disk       /dev/mapper/VolGroup00-LogVol04;

    address    192.168.65.237:7790;

    flexible-meta-disk  internal;

  }

 

  on backend2.scholar {

    device    /dev/drbd2;

    disk      /dev/mapper/VolGroup00-LogVol04;

    address   192.168.65.238:7790;

    meta-disk internal;

  }

 

}

 

Ben Lavender

Systems Design

MC Dean Europe - Stuttgart

+49(0)711 849 50179

+1 703 803 6231 x 7179

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.linbit.com/pipermail/drbd-user/attachments/20070829/83695555/attachment.htm 


More information about the drbd-user mailing list