Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I already posted this mail to the list last week, but the mailing-list-software denied to forward mails larger than 40KB (mine was...) So I post this now without the logfile, which was the largest attachment.
If it is needed, please let me know.
On Linux-HA-list I got this reply, so maybe one of the DRBD-specialists here can help me getting the config working again:
> -----Ursprüngliche Nachricht-----
> Von: linux-ha-bounces at lists.linux-ha.org
> [mailto:linux-ha-bounces at lists.linux-
> ha.org] Im Auftrag von Chun Tian (binghe)
> Gesendet: Montag, 10. März 2008 14:57
> An: General Linux-HA mailing list
> Betreff: Re: AW: AW: [Linux-HA] Switchover problem with DRBD
>
> Hi, Florian
>
> I compard my HA config, can almost say, your Heartbeat configure just
> can work, but DRBD has something wrong. See this:
>
> crmd[17381]: 2008/03/05_11:44:34 ERROR: process_lrm_event: LRM
> operation DRBD_AFD:1_promote_0 (17) Timed Out (timeout=20000ms)
> drbd[18348]: 2008/03/05_11:44:34 DEBUG: r0 notify: post for stop -
> counts: active 0 - starting 1 - stopping 1
> drbd[18348]: 2008/03/05_11:44:34 DEBUG: r0: Calling drbdadm -c /etc/
> drbd.conf state r0
> drbd[18348]: 2008/03/05_11:44:44 DEBUG: r0: Exit code 0
> drbd[18348]: 2008/03/05_11:44:44 DEBUG: r0: Command output: Child
> process does not terminate! Exiting. No response from the DRBD driver!
> Is the module loaded? Unknown/TOO_LARGE
> drbd[18348]: 2008/03/05_11:44:44 DEBUG: r0: Calling drbdadm -c /etc/
> drbd.conf cstate r0
> lrmd[17378]: 2008/03/05_11:44:54 WARN: DRBD_AFD:1:notify process (PID
> 18348) timed out (try 1). Killing with signal SIGTERM (15).
> lrmd[17378]: 2008/03/05_11:44:54 WARN: operation notify[18] on
> ocf::drbd::DRBD_AFD:1 for client 17381, its parameters:
> CRM_meta_role=[Master] CRM_meta_notify_stop_resource=[DRBD_AFD:0 ]
> CRM_meta_notify_operation=[stop]
> CRM_meta_notify_start_resource=[DRBD_AFD:1 ]
> CRM_meta_notify_stop_uname=[noderz ]
> CRM_meta_notify_promote_resource=[DRBD_AFD:1 ] drbd_resource=[r0]
> CRM_meta_notify_master_uname=[noderz ]
> CRM_meta_notify_demote_uname=[noderz ] CRM_meta_master_max=[1]
> CRM_meta_notify_master_resource=[DRBD_AFD:0 ] CRM_meta_timeout=[20000]
> CRM_meta_s: pid [18348] timed out
>
> There's something wrong when HA running drbdadm command, it hangs. By
> seeing you drbd.conf, I think you may be using the DRBD 8.x but not
> 7.x, am I right? I must say for your case, the more stable DRBD 7.x is
> enough: you never want Two-Primary DRBD node.
>
> Regards,
>
> Chun Tian (binghe)
--------------------------------------------------------------------------------------------------
Hi everybody,
Testing my 2-node-cluster i got a strange behaviour when stopping heartbeat on my primary node. I don't know if it is caused by heartbeat or DRBD or both, so I post this in both lists.
Starting with this:
============
Last updated: Wed Mar 5 15:01:10 2008
Current DC: noderz (91d062c3-ad0a-4c24-b759-acada7f19101)
2 Nodes configured.
3 Resources configured.
============
Node: noderz (91d062c3-ad0a-4c24-b759-acada7f19101): online
Node: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d): online
Master/Slave Set: DRBD
DRBD_AFD:0 (heartbeat::ocf:drbd): Master noderz
DRBD_AFD:1 (heartbeat::ocf:drbd): Started nodekrz Resource Group: Group1
Filesystem (heartbeat::ocf:Filesystem): Started noderz
AFD (lsb:afdha): Started noderz
Cluster_IP (heartbeat::ocf:IPaddr): Started noderz
I said /etc/init.d/heartbeat stop on primary node (noderz) and expected this:
============
Last updated: Wed Mar 5 15:01:10 2008
Current DC: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d)
2 Nodes configured.
3 Resources configured.
============
Node: noderz (91d062c3-ad0a-4c24-b759-acada7f19101): OFFLINE
Node: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d): online
Master/Slave Set: DRBD
DRBD_AFD:0 (heartbeat::ocf:drbd): stopped
DRBD_AFD:1 (heartbeat::ocf:drbd): Master nodekrz
Resource Group: Group1
Filesystem (heartbeat::ocf:Filesystem): Started nodekrz
AFD (lsb:afdha): Started nodekrz
Cluster_IP (heartbeat::ocf:IPaddr): Started nodekrz
But I got this:
============
Last updated: Wed Mar 5 14:52:06 2008
Current DC: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d)
2 Nodes configured.
3 Resources configured.
============
Node: noderz (91d062c3-ad0a-4c24-b759-acada7f19101): OFFLINE
Node: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d): online
Master/Slave Set: DRBD
DRBD_AFD:0 (heartbeat::ocf:drbd): Stopped
DRBD_AFD:1 (heartbeat::ocf:drbd): Started nodekrz
Failed actions:
DRBD_AFD:1_promote_0 (node=nodekrz, call=17, rc=-2): Timed Out
I added the /var/log/ha-debug of the node, a cibadmin -Q, my ha.cf and my drbd.conf (if needed)
Would be nice if someone could give me a hint why the switchover fails.
Thanks a lot for any help.
Florian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cib.xml
Type: text/xml
Size: 19937 bytes
Desc: cib.xml
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20080310/7df112ac/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ha.cf
Type: application/octet-stream
Size: 423 bytes
Desc: ha.cf
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20080310/7df112ac/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: drbd.conf
Type: application/octet-stream
Size: 831 bytes
Desc: drbd.conf
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20080310/7df112ac/attachment-0001.obj>