Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2004-07-16 13:15:24 -0500 \ John Lange: > I'm having a serious problem with drbd hard-locking my primary node. > > I'm using drbd along with heartbeat to create a fail-over cluster for an > apache/php/mysql server. > > Data replication is being done over a Realtech Gigabit network adaptor. > > The nodes come up fine and initial replication takes place across the > adaptor etc. All is well. I can take down the nodes manually and the > other takes over and all is appears well. > > Then at some seemingly random time the primary node simply "hard-locks". > There is nothing displayed on the console and nothing logged. > > At some lower level the machine appears to be functioning because the > both interfaces are still responding to ping. The external interface > responds normally but the > internal interface (the one used for replication) is dropping a very > high percentage of packets (60%). doh. broken hardware can always result in strange behaviour... > The setup is software raid 0 with drbd on top of it and then an ext3 > files system mounted on top of that. > > The kernel is 2.4.22. > > A couple more quick notes; I believe this only ever happens when running > in Primary/Secondary. What I mean is, when one machine is down the other > one never crashes (though my testing is limited in this area). > > Also, when the machine is rebooted the replication starts. While the > replication is running, pinging across the interface still results in > about 50% packet loss. Could this be a cable problem and would that > result in lock-ups? > > Here is cat /proc/drbd during syncing. > > 0: cs:SyncingAll st:Secondary/Primary ns:0 nr:81069096 dw:81069096 dr:0 > pe:0 ua:17 > [==============>.....] sync'ed: 71.7% (31375/110540)M > finish: 0:10:39h speed: 55,406 (50,666) K/sec and this is which DRBD version ?? if it is anything < 0.6.12, please try if 0.6.12 /0.6.13 changes anything in this behaviour. > Can we really have a cable problem if we are getting ~50 M /second? > > There is absolutely no packet loss when the sync isn't running. > > The good news in all this is that the failover works flawlessly. I just > don't want to be "testing" it so much ;) Lars Ellenberg