Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I already posted this mail to the list last week, but the mailing-list-software denied to forward mails larger than 40KB (mine was...) So I post this now without the logfile, which was the largest attachment. If it is needed, please let me know. On Linux-HA-list I got this reply, so maybe one of the DRBD-specialists here can help me getting the config working again: > -----Ursprüngliche Nachricht----- > Von: linux-ha-bounces at lists.linux-ha.org > [mailto:linux-ha-bounces at lists.linux- > ha.org] Im Auftrag von Chun Tian (binghe) > Gesendet: Montag, 10. März 2008 14:57 > An: General Linux-HA mailing list > Betreff: Re: AW: AW: [Linux-HA] Switchover problem with DRBD > > Hi, Florian > > I compard my HA config, can almost say, your Heartbeat configure just > can work, but DRBD has something wrong. See this: > > crmd[17381]: 2008/03/05_11:44:34 ERROR: process_lrm_event: LRM > operation DRBD_AFD:1_promote_0 (17) Timed Out (timeout=20000ms) > drbd[18348]: 2008/03/05_11:44:34 DEBUG: r0 notify: post for stop - > counts: active 0 - starting 1 - stopping 1 > drbd[18348]: 2008/03/05_11:44:34 DEBUG: r0: Calling drbdadm -c /etc/ > drbd.conf state r0 > drbd[18348]: 2008/03/05_11:44:44 DEBUG: r0: Exit code 0 > drbd[18348]: 2008/03/05_11:44:44 DEBUG: r0: Command output: Child > process does not terminate! Exiting. No response from the DRBD driver! > Is the module loaded? Unknown/TOO_LARGE > drbd[18348]: 2008/03/05_11:44:44 DEBUG: r0: Calling drbdadm -c /etc/ > drbd.conf cstate r0 > lrmd[17378]: 2008/03/05_11:44:54 WARN: DRBD_AFD:1:notify process (PID > 18348) timed out (try 1). Killing with signal SIGTERM (15). > lrmd[17378]: 2008/03/05_11:44:54 WARN: operation notify[18] on > ocf::drbd::DRBD_AFD:1 for client 17381, its parameters: > CRM_meta_role=[Master] CRM_meta_notify_stop_resource=[DRBD_AFD:0 ] > CRM_meta_notify_operation=[stop] > CRM_meta_notify_start_resource=[DRBD_AFD:1 ] > CRM_meta_notify_stop_uname=[noderz ] > CRM_meta_notify_promote_resource=[DRBD_AFD:1 ] drbd_resource=[r0] > CRM_meta_notify_master_uname=[noderz ] > CRM_meta_notify_demote_uname=[noderz ] CRM_meta_master_max=[1] > CRM_meta_notify_master_resource=[DRBD_AFD:0 ] CRM_meta_timeout=[20000] > CRM_meta_s: pid [18348] timed out > > There's something wrong when HA running drbdadm command, it hangs. By > seeing you drbd.conf, I think you may be using the DRBD 8.x but not > 7.x, am I right? I must say for your case, the more stable DRBD 7.x is > enough: you never want Two-Primary DRBD node. > > Regards, > > Chun Tian (binghe) -------------------------------------------------------------------------------------------------- Hi everybody, Testing my 2-node-cluster i got a strange behaviour when stopping heartbeat on my primary node. I don't know if it is caused by heartbeat or DRBD or both, so I post this in both lists. Starting with this: ============ Last updated: Wed Mar 5 15:01:10 2008 Current DC: noderz (91d062c3-ad0a-4c24-b759-acada7f19101) 2 Nodes configured. 3 Resources configured. ============ Node: noderz (91d062c3-ad0a-4c24-b759-acada7f19101): online Node: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d): online Master/Slave Set: DRBD DRBD_AFD:0 (heartbeat::ocf:drbd): Master noderz DRBD_AFD:1 (heartbeat::ocf:drbd): Started nodekrz Resource Group: Group1 Filesystem (heartbeat::ocf:Filesystem): Started noderz AFD (lsb:afdha): Started noderz Cluster_IP (heartbeat::ocf:IPaddr): Started noderz I said /etc/init.d/heartbeat stop on primary node (noderz) and expected this: ============ Last updated: Wed Mar 5 15:01:10 2008 Current DC: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d) 2 Nodes configured. 3 Resources configured. ============ Node: noderz (91d062c3-ad0a-4c24-b759-acada7f19101): OFFLINE Node: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d): online Master/Slave Set: DRBD DRBD_AFD:0 (heartbeat::ocf:drbd): stopped DRBD_AFD:1 (heartbeat::ocf:drbd): Master nodekrz Resource Group: Group1 Filesystem (heartbeat::ocf:Filesystem): Started nodekrz AFD (lsb:afdha): Started nodekrz Cluster_IP (heartbeat::ocf:IPaddr): Started nodekrz But I got this: ============ Last updated: Wed Mar 5 14:52:06 2008 Current DC: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d) 2 Nodes configured. 3 Resources configured. ============ Node: noderz (91d062c3-ad0a-4c24-b759-acada7f19101): OFFLINE Node: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d): online Master/Slave Set: DRBD DRBD_AFD:0 (heartbeat::ocf:drbd): Stopped DRBD_AFD:1 (heartbeat::ocf:drbd): Started nodekrz Failed actions: DRBD_AFD:1_promote_0 (node=nodekrz, call=17, rc=-2): Timed Out I added the /var/log/ha-debug of the node, a cibadmin -Q, my ha.cf and my drbd.conf (if needed) Would be nice if someone could give me a hint why the switchover fails. Thanks a lot for any help. Florian -------------- next part -------------- A non-text attachment was scrubbed... Name: cib.xml Type: text/xml Size: 19937 bytes Desc: cib.xml URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20080310/7df112ac/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: ha.cf Type: application/octet-stream Size: 423 bytes Desc: ha.cf URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20080310/7df112ac/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: drbd.conf Type: application/octet-stream Size: 831 bytes Desc: drbd.conf URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20080310/7df112ac/attachment-0001.obj>