Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, Bernd Schubert wrote: > > > Well, the question is, why does it want to sync into the wrong direction on > the first connection attempt. To my knowledge the sync direction is from the > node with the recent data to the node with the outdated data. > Is there by any chance something mounting the underlying partitions at boot > time? no way, no. > Here also a rather unprobable theory, after a server crash we updated the bios > of the mainboard, which set the time back by about 2 years. During the sync > from the failover node everything was synced back and not only the outdated > data. Is there perhaps the hardware clock of the master node running in the > future at boot time? I know, its just a silly idea, but who knows... could have been, but the messages I posted showed exactly the same time. > Today Shane also reported the same problem, did you already try the timeout > value? > I'm not using heartbeat so at this stage I don't have a problem with a cluster software but am glad not to be the only one to stumble over this... I had a closer look at the difference between "reboot" and calling "/etc/init.d/drbd restart". When doing "reboot" the drbd script is invoked at "K08" - pretty early. The filesystems are still mounted and so drbd cannot exit. When the network goes down, this is what happens: drbd1: PingAck did not arrive in time. d1: drbd1_receiver [9369]: cstate BrokenPipe --> Unconnected cstate BrokenPe1: asender terminated drbd1: drbd1_receiddrbd1: Connection lost. drbd1: drbd1_receiver [9369]: cstate Unconnected --> WFConnection drbd0: PingAck did not arterminated drbd0: drbd0_receiver [9365]: cstate BrokenPipe --> Unconnected drbd0: Connection lost. etworkFailurereceiver [9365]: cstate Unconnected --> WFConnectinnected --> WF drbd0: asender terminated drbd0: drbd0_receiver [9365]: cstate NetworkFailure --> BrokenPipe drbd0: short read expecting header on sock: r=-512 drbd0: worker on done. (Output is a little garbeld - came from a serial line). There is no clean exit for the other node to recognize. My impression is that this is the reason. Thanks for your help. Peter