Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
> > That is correct. The machine is only doing DRBD - both machines are freshly > > installed with just DRBD and heartbeat setup. The problem occurs whether or > > not /dev/nb0 is mounted or unmounted. The only resources Heartbeat manages in > > the cluster is one DRBD device and one virtual IP address. > > please reproduce it without heartbeat... "by hand" > Thanks. > > Lars > I managed to reproduce it again. /dev/nb0 was unmounted and heartbeat and DRBD were stopped on both hosts. Then checked that all RAID devices were healthy and that the time on both hosts were correct and synchronised. Started DRBD (command 'service drbd start') on the host that was last primary. Started drbd on the secondary and monitored the hosts durng the syncall of /dev/nb0. When finished, I set the clock on the secondary forward 8 hours with the 'date' command. Finally, I started a resync on the primary with the command 'drbdsetup /dev/nb0 replicate'. I monitored both RAID and DRBD during the resync until the secondary host locked up. I did not see the sync rate drop prior to the lockup. I suspect now that the slow sync rate was due to the software RAID 1 also syncing as you "guessed" as the last two times I reproduced this lock up I did not see the sync rate drop. However, I am pretty sure that the locking up on the secondary occurs when it's system clock is drifting from a time in the future back to the correct time whilst DRBD is resyncing. I can't see what else could be causing the host to lock up whlst DRBD is resyncing. I've tried to stop everything running other than DRBD and NTPD. If you are very sure that DRBD should not be failing under these conditions, then I think I will need to try a fresh "minimal" install of RedHat without software RAID and without channel bonding and try introducing components and see if I can ascertain which component is causing the problem. Cheers, Jeff.