[DRBD-user] frequnet crashes/reboots on a drbd/ha/xen setup

Ivars Strazdiņš ivars.strazdins at gmail.com
Wed Jun 3 09:47:56 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Heiko,
any hint from you about actual crash itself (stack trace) as well as OS 
and software versions involved (xen/heartbeat/drbd etc.)?
BR,
Ivars

Heiko wrote:
> Hello,
>
> i am investigating why our server pairs reboot themselves from time to 
> time.
> This is very annoing because these machines are in production and i always
> have to fix mysql replications or drbd splitbrains after these reboots.
>
> We have 3 pairs that use a drbd/xen/heartbeat setup and 2 of these 
> pairs crash,
> sometimes every 2 week sometimes only twice a year.
>
> I first thought it could be heartbeat, but I stopped the service on 1 
> pair and we also had a crash.
> Are there other people who had these kind of crashes?
> I dont even know if it is a crash, i never can find anything in my 
> logfiles about problems, or about heartbeat that does a safety reboot.
>
> this is one drbd.conf entry:
>
> resource drbd_backend {
>   protocol C;
>   startup {
>     degr-wfc-timeout 120;    # 2 minutes.
>   }
>   disk {
>     on-io-error   detach;
>   }
>   net {
>   }
>   syncer {
>         rate 500M;
>         al-extents 257;
>   }
>   
>   on xen-B1.fra1 {
>     device    /dev/drbd0;
>     disk      /dev/md3;
>     address   172.20.2.1:7788 <http://172.20.2.1:7788>;
>     meta-disk internal;
>   }
>   on xen-A1.fra1 {
>     device    /dev/drbd0;
>     disk      /dev/md3;
>     address   172.20.1.1:7788 <http://172.20.1.1:7788>;
>     meta-disk internal;
>   }
> } 
>
>
> this the ha.cf <http://ha.cf>
>
> debugfile /var/log/ha-debug
> logfile /var/log/ha-log
> logfacility     local0
> keepalive 2
> deadtime 60
> #warntime 10
> initdead 120
> udpport 694
> ucast eth0 172.20.1.1
> ucast eth0 172.20.2.1
> auto_failback on
> node xen-A1.fra1
> node xen-B1.fra1
>
>
> and this the xen config
>
> debugfile /var/log/ha-debug
> logfile /var/log/ha-log
> logfacility     local0
> keepalive 2
> deadtime 60
> #warntime 10
> initdead 120
> udpport 694
> ucast eth0 172.20.1.1
> ucast eth0 172.20.2.1
> auto_failback on
> node xen-A1.fra1
> node xen-B1.fra1
>
>
> can you please give me some assistance?
>
> greetings
>
> Rupert
> ------------------------------------------------------------------------
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>   



More information about the drbd-user mailing list