[DRBD-user] heartbeat starts until drbd sync is finished

Uwe Melzer uwe.melzer at inatec.com
Wed Dec 13 12:15:12 CET 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Lars Ellenberg wrote:
> / 2006-12-12 21:49:36 +0100
> \ Uwe Melzer:
> 
>>Hi,
>>I use DRBD 0.7.22 with heartbeat 1.2.3 .
>>I have defined 10 drbd devices, each with it's own sync group (0-9).
>>wfc_timeout is set to 0, degr-wfc-timeout to 120 for all devices.
>>
>>During boot process I observe the following on returned primary node.
>>When a sync is running heartbeat starts didn't wait sync is finshed
>>for all devices.
>>'auto_failback on' is setting in the ha.cf file.
>>
>>Why? The last command in the drbd start script is 'drbdadm wait_con_int' .
> 
> 
> Wait for Connection Interactively.
> 
> there is no mention of wait for sync here.
> 
> 
>>But there is not description in the man pages (drbdadm) for wait_con_int.
>>On the drbd devices run a database installation, so I must wait for the
>>end of the sync.
>>Where can be the mistake, was is going wrong.
> 
> 
> set auto_failback off.
> 
> if you really cannot live without it,
> there is a wait_sync command for drbdsetup (not for drbdadm).
> 

Let me describe the situation:
secondary node is Primary/Unknown
primary is booting
drbd starts on primary, sync per group is starting - start script leaves
heartbeat starts on primary, auto_failback on --> take over resouces initiate
secondary node try to set devices in secondary status per drbddisk.

Today I looked in the ha-log files on both nodes. During the sync phase I found
these infos an the secondary node:

heartbeat: 2006/12/12_19:44:37 info: Running /etc/ha.d/resource.d/drbddisk logs stop
heartbeat: 2006/12/12_19:44:42 ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2006/12/12_19:44:43 info: Retrying failed stop operation [drbddisk::logs]
heartbeat: 2006/12/12_19:44:43 info: Running /etc/ha.d/resource.d/drbddisk logs stop
heartbeat: 2006/12/12_19:44:49 ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2006/12/12_19:44:50 info: Retrying failed stop operation [drbddisk::logs]
heartbeat: 2006/12/12_19:44:50 info: Running /etc/ha.d/resource.d/drbddisk logs stop
heartbeat: 2006/12/12_19:44:55 ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2006/12/12_19:44:56 info: Retrying failed stop operation [drbddisk::logs]
heartbeat: 2006/12/12_19:44:56 info: Running /etc/ha.d/resource.d/drbddisk logs stop
heartbeat: 2006/12/12_19:45:01 ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2006/12/12_19:45:02 info: Retrying failed stop operation [drbddisk::logs]
heartbeat: 2006/12/12_19:45:02 info: Running /etc/ha.d/resource.d/drbddisk logs stop
heartbeat: 2006/12/12_19:45:07 ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2006/12/12_19:45:08 info: Retrying failed stop operation [drbddisk::logs]
heartbeat: 2006/12/12_19:45:08 info: Running /etc/ha.d/resource.d/drbddisk logs stop
heartbeat: 2006/12/12_19:45:13 ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2006/12/12_19:45:14 info: Retrying failed stop operation [drbddisk::logs]
heartbeat: 2006/12/12_19:45:14 info: Running /etc/ha.d/resource.d/drbddisk logs stop
heartbeat: 2006/12/12_19:45:19 ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2006/12/12_19:45:20 info: Retrying failed stop operation [drbddisk::logs]
heartbeat: 2006/12/12_19:45:20 info: Running /etc/ha.d/resource.d/drbddisk logs stop
heartbeat: 2006/12/12_19:45:25 ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2006/12/12_19:45:26 info: Retrying failed stop operation [drbddisk::logs]
heartbeat: 2006/12/12_19:45:26 info: Running /etc/ha.d/resource.d/drbddisk logs stop
heartbeat: 2006/12/12_19:45:31 ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2006/12/12_19:45:32 info: Retrying failed stop operation [drbddisk::logs]
heartbeat: 2006/12/12_19:45:32 info: Running /etc/ha.d/resource.d/drbddisk logs stop
heartbeat: 2006/12/12_19:45:37 ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2006/12/12_19:45:38 info: Retrying failed stop operation [drbddisk::logs]
heartbeat: 2006/12/12_19:45:38 info: Running /etc/ha.d/resource.d/drbddisk logs stop
heartbeat: 2006/12/12_19:45:43 ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2006/12/12_19:46:11 ERROR: Resource script for drbddisk::logs probably not
LSB-compliant.
heartbeat: 2006/12/12_19:46:11 WARN: it (drbddisk::logs) MUST succeed on a stop when
already stopped
heartbeat: 2006/12/12_19:46:11 WARN: Machine reboot narrowly avoided!

It seems for me that drbddisk didn't check that the device is in sync mode.

Please send me your comment.
Thanks and Regards
-- Uwe




More information about the drbd-user mailing list