[DRBD-user] lockup on primary node

Lars Ellenberg Lars.Ellenberg at linbit.com
Fri Jul 16 20:37:10 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2004-07-16 13:15:24 -0500
\ John Lange:
> I'm having a serious problem with drbd hard-locking my primary node.
> 
> I'm using drbd along with heartbeat to create a fail-over cluster for an
> apache/php/mysql server.
> 
> Data replication is being done over a Realtech Gigabit network adaptor.
> 
> The nodes come up fine and initial replication takes place across the
> adaptor etc. All is well. I can take down the nodes manually and the
> other takes over and all is appears well.
> 
> Then at some seemingly random time the primary node simply "hard-locks".
> There is nothing displayed on the console and nothing logged.
> 
> At some lower level the machine appears to be functioning because the
> both interfaces are still responding to ping. The external interface
> responds normally but the

> internal interface (the one used for replication) is dropping a very
> high percentage of packets (60%).

doh. broken hardware can always result in strange behaviour...

> The setup is software raid 0 with drbd on top of it and then an ext3
> files system mounted on top of that.
> 
> The kernel is 2.4.22.
> 
> A couple more quick notes; I believe this only ever happens when running
> in Primary/Secondary. What I mean is, when one machine is down the other
> one never crashes (though my testing is limited in this area).
> 
> Also, when the machine is rebooted the replication starts. While the
> replication is running, pinging across the interface still results in
> about 50% packet loss. Could this be a cable problem and would that
> result in lock-ups?
> 
> Here is cat /proc/drbd during syncing.
> 
> 0: cs:SyncingAll st:Secondary/Primary ns:0 nr:81069096 dw:81069096 dr:0
> pe:0 ua:17
>         [==============>.....] sync'ed: 71.7% (31375/110540)M
>         finish: 0:10:39h speed: 55,406 (50,666) K/sec

and this is which DRBD version ??
if it is anything < 0.6.12,
please try if 0.6.12 /0.6.13 changes anything in this behaviour.

> Can we really have a cable problem if we are getting ~50 M /second?
> 
> There is absolutely no packet loss when the sync isn't running.
> 
> The good news in all this is that the failover works flawlessly. I just
> don't want to be "testing" it so much ;)

	Lars Ellenberg



More information about the drbd-user mailing list