[DRBD-user] Re: dopd Crash

Thu Jun 12 19:44:15 CEST 2008

Bump.  Nobody else is seeing dopd crashes in their logs?

On Tue, Jun 10, 2008 at 4:05 PM, Art Age Software <artagesw at gmail.com> wrote:
> I was recently performing some testing of a 2-node drbd-heartbeat
> setup. Everything is operating fine. However, when I rebooted the
> server on which drbd was  secondary, the primary node's system log
> output the following worrisome messages:
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Jun 10 22:09:34 node1 /usr/lib64/heartbeat/dopd: [5283]: info: sending
> start_outdate message to the other node node1 -> node2
> Jun 10 22:09:39 node1 /usr/lib64/heartbeat/dopd: [5283]: ERROR:
> ipc_bufpool_update: magic number in head does not match.Something very
> bad happened, abort now, farside pid =6678
> Jun 10 22:09:39 node1 /usr/lib64/heartbeat/dopd: [5283]: ERROR:
> magic=63203a72, expected value=abcd
> Jun 10 22:09:39 node1 /usr/lib64/heartbeat/dopd: [5283]: info: pool:
> refcount=1, startpos=0x38b7838,
> currpos=0x38b78e5,consumepos=0x38b78a3, endpos=0x38b8808, size=4096
> Jun 10 22:09:39 node1 /usr/lib64/heartbeat/dopd: [5283]: info: nmsgs=0
> Jun 10 22:09:39 node1 heartbeat: [4999]: WARN: Managed
> /usr/lib64/heartbeat/dopd process 5283 killed by signal 6 [SIGABRT -
> Abort].
> Jun 10 22:09:39 node1 heartbeat: [4999]: ERROR: Managed
> /usr/lib64/heartbeat/dopd process 5283 dumped core
> Jun 10 22:09:39 node1 heartbeat: [4999]: ERROR: Respawning client
> "/usr/lib64/heartbeat/dopd":
> Jun 10 22:09:39 node1 heartbeat: [4999]: info: Starting child client
> "/usr/lib64/heartbeat/dopd" (90,90)
> Jun 10 22:09:39 node1 heartbeat: [6679]: info: Starting
> "/usr/lib64/heartbeat/dopd" as uid 90  gid 90 (pid 6679)
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> The node seemed to recover from this condition with no apparent
> problems as the primary node came back online. Still, I am concerned
> that dopd crashed like that. Has anyone else seen this behavior? Is it
> a known issue? If there is any more info I can provide that would help
> in explaining or possibly finding/fixing the cause, please let me
> know.
>
> Thanks.
>