Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I was recently performing some testing of a 2-node drbd-heartbeat setup. Everything is operating fine. However, when I rebooted the server on which drbd was secondary, the primary node's system log output the following worrisome messages: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Jun 10 22:09:34 node1 /usr/lib64/heartbeat/dopd: [5283]: info: sending start_outdate message to the other node node1 -> node2 Jun 10 22:09:39 node1 /usr/lib64/heartbeat/dopd: [5283]: ERROR: ipc_bufpool_update: magic number in head does not match.Something very bad happened, abort now, farside pid =6678 Jun 10 22:09:39 node1 /usr/lib64/heartbeat/dopd: [5283]: ERROR: magic=63203a72, expected value=abcd Jun 10 22:09:39 node1 /usr/lib64/heartbeat/dopd: [5283]: info: pool: refcount=1, startpos=0x38b7838, currpos=0x38b78e5,consumepos=0x38b78a3, endpos=0x38b8808, size=4096 Jun 10 22:09:39 node1 /usr/lib64/heartbeat/dopd: [5283]: info: nmsgs=0 Jun 10 22:09:39 node1 heartbeat: [4999]: WARN: Managed /usr/lib64/heartbeat/dopd process 5283 killed by signal 6 [SIGABRT - Abort]. Jun 10 22:09:39 node1 heartbeat: [4999]: ERROR: Managed /usr/lib64/heartbeat/dopd process 5283 dumped core Jun 10 22:09:39 node1 heartbeat: [4999]: ERROR: Respawning client "/usr/lib64/heartbeat/dopd": Jun 10 22:09:39 node1 heartbeat: [4999]: info: Starting child client "/usr/lib64/heartbeat/dopd" (90,90) Jun 10 22:09:39 node1 heartbeat: [6679]: info: Starting "/usr/lib64/heartbeat/dopd" as uid 90 gid 90 (pid 6679) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ The node seemed to recover from this condition with no apparent problems as the primary node came back online. Still, I am concerned that dopd crashed like that. Has anyone else seen this behavior? Is it a known issue? If there is any more info I can provide that would help in explaining or possibly finding/fixing the cause, please let me know. Thanks.