[DRBD-user] Drbd refuses to promote

Johan Verrept Johan.Verrept at able.be
Wed Sep 23 14:05:56 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

I am having a weird problem with my RA in my virtual machines. As far as
I can tell, it is related to timeouts when syncing.

I have a working cluster where one node is inconsistent and syncing from
the other node. When I reboot the SyncSource machine (yes, I am trying
to break things and see how it all reacts :P) the other node notices and
refuses to promote itself because it there is no consistent disk. So
far, so good. What happens next is that the SyncSource machine comes
back up and the sync restarts. Then the SyncSource machine tries to
promote itself to master but this times out. It then tries to demote
itself, but this times out too. After all this, the drbd is stopped and
then started, repeating the process. (see attached log)

If I reboot the SyncTarget node, the SyncSource node comes up safely and
when the SyncTarget node comes up it smoothly joins the merry cluster
and the problem is gone. Rebooting the SyncSource node again recreates
the problem.

Does this have any cause except load?

Where does the 20000ms timeout come from? It is not configured by me and
a quick search for "20" in "cibadmin --query" doesn't return any
relevant results. Can I modify it and should I? Are there other
solutions for this problem?

Regards,

	Johan

-------------- next part --------------
12:44:01 SYSLOG info crmd [18177]: info: do_lrm_rsc_op: Performing key=16:68:0:e6df7f7f-a533-4afe-af87-a630b2ea89d3 op=drbd0:1_promote_0 )
12:44:01 SYSLOG info lrmd [18174]: info: rsc:drbd0:1: promote
12:44:03 SYSLOG warning heartbeat [17330]: WARN: Late heartbeat: Node slave.domain.be: interval 5250 ms
12:44:09 SYSLOG info kernel block drbd0: role( Secondary -> Primary ) 
12:44:15 SYSLOG warning kernel block drbd0: md_sync_timer expired! Worker calls drbd_md_sync().
12:44:21 SYSLOG warning lrmd [18174]: WARN: drbd0:1:promote process (PID 26476) timed out (try 1).  Killing with signal SIGTERM (15).
12:44:21 SYSLOG warning lrmd [18174]: WARN: operation promote[26] on ocf::drbd::drbd0:1 for client 18177, its parameters: CRM_meta_clone=[1] CRM_meta_notify_slave_resource=[drbd0:0 drbd0:1 ] CRM_meta_notify_active_resource=[
12:44:21 SYSLOG err crmd [18177]: ERROR: process_lrm_event: LRM operation drbd0:1_promote_0 (26) Timed Out (timeout=20000ms)
12:44:22 SYSLOG info crmd [18177]: info: do_lrm_rsc_op: Performing key=95:68:0:e6df7f7f-a533-4afe-af87-a630b2ea89d3 op=drbd0:1_notify_0 )
12:44:22 SYSLOG info lrmd [18174]: info: rsc:drbd0:1: notify
12:44:22 SYSLOG warning heartbeat [17330]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 60 ms (> 50 ms) (GSource: 0x810e658)
12:44:25 SYSLOG warning heartbeat [17330]: WARN: Late heartbeat: Node slave.domain.be: interval 5360 ms
12:44:26 SYSLOG info attrd [18176]: info: attrd_ha_callback: flush message from slave.domain.be
12:44:30 SYSLOG warning heartbeat [17330]: WARN: Late heartbeat: Node slave.domain.be: interval 5320 ms
12:44:40 SYSLOG info crm_attribute [6931]: info: Invoked: crm_attribute -N master.domain.be -n master-drbd0:1 -l reboot -v 100
12:44:40 SYSLOG info lrmd [18174]: info: RA output: (drbd0:1:notify:stdout) 

12:44:40 SYSLOG info crmd [18177]: info: process_lrm_event: LRM operation drbd0:1_notify_0 (call=27, rc=0, cib-update=34, confirmed=true) complete ok
12:44:45 SYSLOG info crmd [18177]: info: do_lrm_rsc_op: Performing key=96:69:0:e6df7f7f-a533-4afe-af87-a630b2ea89d3 op=drbd0:1_notify_0 )
12:44:45 SYSLOG info lrmd [18174]: info: rsc:drbd0:1: notify
12:44:46 SYSLOG info crmd [18177]: info: process_lrm_event: LRM operation drbd0:1_notify_0 (call=28, rc=0, cib-update=35, confirmed=true) complete ok
12:44:48 SYSLOG info crmd [18177]: info: do_lrm_rsc_op: Performing key=15:69:0:e6df7f7f-a533-4afe-af87-a630b2ea89d3 op=drbd0:1_demote_0 )
12:44:48 SYSLOG info lrmd [18174]: info: rsc:drbd0:1: demote
12:44:48 SYSLOG info kernel block drbd0: role( Primary -> Secondary ) 
12:44:53 SYSLOG warning kernel block drbd0: md_sync_timer expired! Worker calls drbd_md_sync().
12:44:53 SYSLOG warning heartbeat [17330]: WARN: Late heartbeat: Node slave.domain.be: interval 6060 ms
12:45:08 SYSLOG warning lrmd [18174]: WARN: drbd0:1:demote process (PID 9159) timed out (try 1).  Killing with signal SIGTERM (15).
12:45:08 SYSLOG warning lrmd [18174]: WARN: operation demote[29] on ocf::drbd::drbd0:1 for client 18177, its parameters: CRM_meta_clone=[1] CRM_meta_notify_slave_resource=[drbd0:0 ] CRM_meta_notify_active_resource=[ ] CRM_met
12:45:08 SYSLOG err crmd [18177]: ERROR: process_lrm_event: LRM operation drbd0:1_demote_0 (29) Timed Out (timeout=20000ms)




More information about the drbd-user mailing list