Hello,<br><br>i am investigating why our server pairs reboot themselves from time to time.<br>This is very annoing because these machines are in production and i always<br>have to fix mysql replications or drbd splitbrains after these reboots.<br>
<br>We have 3 pairs that use a drbd/xen/heartbeat setup and 2 of these pairs crash, <br>sometimes every 2 week sometimes only twice a year.<br><br>I first thought it could be heartbeat, but I stopped the service on 1 pair and we also had a crash.<br>
Are there other people who had these kind of crashes?<br>I dont even know if it is a crash, i never can find anything in my logfiles about problems, or about heartbeat that does a safety reboot.<br><br>this is one drbd.conf entry:<br>
<br><span class="postbody">resource drbd_backend {<br> protocol C;<br> startup {<br> degr-wfc-timeout 120; # 2 minutes.<br> }<br> disk {<br> on-io-error detach;<br> }<br> net {<br> }<br> syncer {<br> rate 500M;<br>
al-extents 257;<br> }<br> <br> on xen-B1.fra1 {<br> device /dev/drbd0;<br> disk /dev/md3;<br> address <a href="http://172.20.2.1:7788">172.20.2.1:7788</a>;<br> meta-disk internal;<br> }<br>
on xen-A1.fra1 {<br> device /dev/drbd0;<br> disk /dev/md3;<br> address <a href="http://172.20.1.1:7788">172.20.1.1:7788</a>;<br> meta-disk internal;<br> }<br>} <br><br><br>this the <a href="http://ha.cf">ha.cf</a><br>
<br>debugfile /var/log/ha-debug<br>logfile /var/log/ha-log<br>logfacility local0<br>keepalive 2<br>deadtime 60<br>#warntime 10<br>initdead 120<br>udpport 694<br>ucast eth0 172.20.1.1<br>ucast eth0 172.20.2.1<br>auto_failback on<br>
node xen-A1.fra1<br>node xen-B1.fra1<br><br><br>and this the xen config<br><br>debugfile /var/log/ha-debug<br>logfile /var/log/ha-log<br>logfacility local0<br>keepalive 2<br>deadtime 60<br>#warntime 10<br>initdead 120<br>
udpport 694<br>ucast eth0 172.20.1.1<br>ucast eth0 172.20.2.1<br>auto_failback on<br>node xen-A1.fra1<br>node xen-B1.fra1<br><br><br>can you please give me some assistance?<br><br>greetings<br><br>Rupert<br></span>