[DRBD-user] frequnet crashes/reboots on a drbd/ha/xen setup

Wed Jun 3 07:59:49 CEST 2009

Hello,

i am investigating why our server pairs reboot themselves from time to time.
This is very annoing because these machines are in production and i always
have to fix mysql replications or drbd splitbrains after these reboots.

We have 3 pairs that use a drbd/xen/heartbeat setup and 2 of these pairs
crash,
sometimes every 2 week sometimes only twice a year.

I first thought it could be heartbeat, but I stopped the service on 1 pair
and we also had a crash.
Are there other people who had these kind of crashes?
I dont even know if it is a crash, i never can find anything in my logfiles
about problems, or about heartbeat that does a safety reboot.

this is one drbd.conf entry:

resource drbd_backend {
  protocol C;
  startup {
    degr-wfc-timeout 120;    # 2 minutes.
  }
  disk {
    on-io-error   detach;
  }
  net {
  }
  syncer {
        rate 500M;
        al-extents 257;
  }

  on xen-B1.fra1 {
    device    /dev/drbd0;
    disk      /dev/md3;
    address   172.20.2.1:7788;
    meta-disk internal;
  }
  on xen-A1.fra1 {
    device    /dev/drbd0;
    disk      /dev/md3;
    address   172.20.1.1:7788;
    meta-disk internal;
  }
}

this the ha.cf

debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility     local0
keepalive 2
deadtime 60
#warntime 10
initdead 120
udpport 694
ucast eth0 172.20.1.1
ucast eth0 172.20.2.1
auto_failback on
node xen-A1.fra1
node xen-B1.fra1

and this the xen config

debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility     local0
keepalive 2
deadtime 60
#warntime 10
initdead 120
udpport 694
ucast eth0 172.20.1.1
ucast eth0 172.20.2.1
auto_failback on
node xen-A1.fra1
node xen-B1.fra1

can you please give me some assistance?

greetings

Rupert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090603/05da2d0c/attachment.htm>