[DRBD-user] Failover Behavior in Server-Crash Scenario

Fri Dec 7 01:04:44 CET 2012

On 12/07/2012 12:53 AM, Robinson, Eric wrote:
>>>> Any concurrent log entries in your kernel log, from the 
>> drbd0 device?
>>>>
>>>
>>>
>>> In fact, there are...
>>>
>>> Dec  6 13:51:17 ha09a kernel: d-con ha02_mysql: conn( 
>> Unconnected -> 
>>> WFConnection ) Dec  6 13:51:19 ha09a root: drbd SA notify
>>> Dec  6 13:51:19 ha09a crm_node[25546]:   notice: 
>> crm_add_logfile: Additional logging available in /var/log/corosync.log
>>> Dec  6 13:51:19 ha09a crm_attribute[25547]:   notice: 
>> crm_add_logfile: Additional logging available in /var/log/corosync.log
>>> Dec  6 13:51:20 ha09a root: drbd SA notify
>>> Dec  6 13:51:20 ha09a crm_node[25577]:   notice: 
>> crm_add_logfile: Additional logging available in /var/log/corosync.log
>>> Dec  6 13:51:20 ha09a crm_attribute[25578]:   notice: 
>> crm_add_logfile: Additional logging available in /var/log/corosync.log
>>> Dec  6 13:51:21 ha09a crmd[3066]:   notice: 
>> process_lrm_event: LRM operation p_drbd0_notify_0 (call=500, 
>> rc=0, cib-update=0, confirmed=true) ok
>>> Dec  6 13:51:21 ha09a crmd[3066]:   notice: 
>> process_lrm_event: LRM operation p_drbd1_notify_0 (call=502, 
>> rc=0, cib-update=0, confirmed=true) ok
>>> Dec  6 13:51:22 ha09a root: drbd SA notify Dec  6 13:51:23 
>> ha09a root: 
>>> drbd SA notify
>>> Dec  6 13:51:24 ha09a crmd[3066]:   notice: 
>> process_lrm_event: LRM operation p_drbd0_notify_0 (call=506, 
>> rc=0, cib-update=0, confirmed=true) ok
>>> Dec  6 13:51:24 ha09a crmd[3066]:   notice: 
>> process_lrm_event: LRM operation p_drbd1_notify_0 (call=508, 
>> rc=0, cib-update=0, confirmed=true) ok
>>> Dec  6 13:51:25 ha09a root: drbd SA promote Dec  6 13:51:25 ha09a 
>>> kernel: d-con ha01_mysql: helper command: /sbin/drbdadm fence-peer 
>>> ha01_mysql Dec  6 13:51:25 ha09a kernel: d-con ha01_mysql: helper 
>>> command: /sbin/drbdadm fence-peer ha01_mysql exit code 127 (0x7f00) 
>>> Dec  6 13:51:25 ha09a kernel: d-con ha01_mysql: fence-peer helper 
>>> broken, returned 127
>>
>> Your DRBD refuses to promote because it's unable to get a 
>> meaningful response from the fence-peer handler. That in turn 
>> is because it's failing with a "command not found" error. 
>> (Try typing "foobarblatch; echo $?" in a shell.) Check your 
>> "fence-peer" setting in the handlers section of your DRBD 
>> config, and see whether it points to a non-existing script. 
>> If that script does exist, examine whether it _invokes_ 
>> something that doesn't.
>>
>> Cheers,
>> Florian
>>
> 
> 
> It turns out that the fence-peer handler script does not exist. This is certainly because I copied the drbd.conf file from a preious cluster running drbd 8.3.12. 

/usr/lib/drbd/crm-fence-peer.sh does not exist? That would occur to me
as a packaging error. Have you been rolling or own, or else where did
you get your builds from? Or are you just missing the drbd-pacemaker
subpackage?

> I am now sure that there are other problems in the config file waiting to bite me. Following is what my drbd.conf file looks like. Please tell tell me if you see anywhere ELSE that I have shot myself in the foot.

All looks reasonable. Of course, given the fact that you're missing
crm-fence-peer.sh, if I were you I'd double check the existence (and
executability) of all other handler scripts as well.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now