[DRBD-user] drbdadm expects /usr/local/etc/drbd.conf but linbit:ocf tries /etc/drbd.conf

Koch, Sebastian Sebastian.Koch at netzwerk.de
Thu Sep 30 13:26:40 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi everybody,

 

Today I had the problem that after a reboot, a node wouldn't come back
into Connected State. It was always like WFConnection or Disconnected
and so on. The secondary node did not reconnect and so it wasn't
syncing. I thought I need to recreate the device and do a manual
split-brain recovery. Nothing worked. The DRBD stayed outdated
respectively inconsistent.

 

I was able resolve the issue and hopefully the following explaination is
correct (did it of my memories) and does help some other admins which
sturggled with this issue for days.

 

Some System Info (Debian Stable with Backport packages):

 

--- Cluster Config & Status Dump --

Created: Do 30. Sep 13:25:21 CEST 2010 on pilot01-node1 by uid=0(root)
gid=0(root) Gruppen=0(root)

Systeminfo: Linux pilot01-node1 2.6.28-1-amd64 #1 SMP Wed Feb 18
17:16:12 UTC 2009 x86_64 GNU/Linux

 

#####################

### 1. DRBD State ###

#####################

 

drbd driver loaded OK; device status:

version: 8.3.7 (api:88/proto:86-91)

GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
root at prolog01-pilot1, 2010-06-07 17:34:47

m:res     cs         ro                 ds                 p  mounted
fstype

0:pilot0  Connected  Primary/Secondary  UpToDate/UpToDate  C
/mnt/cluster  xfs

 

------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------

 

That was the initial drbd state of the secondary node:

 

root at pilot01-node2:/home/nwadmin# cat /proc/drbd

version: 8.3.7 (api:88/proto:86-91)

GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
root at prolog01-pilot1, 2010-06-07 17:34:47

 0: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C
r----

    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b
oos:1951768

 

------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------

 

Resolution:

 

Then I looked at the logs (maybe a little too late) and saw that there
were erros concerning the drbd.conf. The ocf:linbit:drbd uses
/etc/drbd.conf as the OCF_RESKEY_drbdconf and my drbdadm tool always
wanted to use /usr/local/etc/drbd.conf (maybe this is compiled into the
drb-utils, I wasn't able to figure that out) therefor the pacemaker
always refused to let the secondary node connect to the drbd device.

 

What I did to resolve it was:

 

1.       Change my resource to something like this:

primitive drbd_pilot0 ocf:linbit:drbd \

        params drbd_resource="pilot0"
drbdconf="/usr/local/etc/drbd.conf" \

        operations $id="drbd_pilot0-operations" \

        op monitor interval="15s"

2.       Cleaned up all the erros on the resource:

Crm resource cleanup ms_drbd_pilot0

 

 

------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------

 

Here is the State of the Syncing:

 

root at pilot01-node1:/home/nwadmin# crm resource cleanup res_MySQL

Cleaning up res_MySQL on pilot01-node1

Cleaning up res_MySQL on pilot01-node2

root at pilot01-node1:/home/nwadmin# cat /proc/drbd

version: 8.3.7 (api:88/proto:86-91)

GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
root at prolog01-pilot1, 2010-06-07 17:34:47

 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----

    ns:879617 nr:0 dw:2121 dr:890151 al:5 bm:53 lo:1 pe:39 ua:189 ap:0
ep:1 wo:b oos:1073972

        [========>...........] sync'ed: 45.1% (1073972/1951768)K

        finish: 0:00:22 speed: 47,920 (48,764) K/sec

 

------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------

 

Here is a relevant log output:

 

crmd: [2326]: info: do_lrm_rsc_op: Performing
key=81:445:0:106b9e8c-1ea2-475f-b2c9-ddb3088ea7aa
op=drbd_pilot0:1_notify_0 )

Sep 30 11:29:24 s_all at pilot01-node2 lrmd: [2323]: info: RA output:
(drbd_pilot0:1:notify:stderr) Warning: resource pilot0 last used config
file: /etc/drbd.conf   current config file: /usr/local/etc/drbd.conf

Sep 30 11:29:24 s_all at pilot01-node2 lrmd: [2323]: info: RA output:
(drbd_pilot0:1:notify:stderr) /usr/lib/ocf/resource.d//linbit/drbd: line
762: [: too many arguments

Sep 30 11:29:24 s_all at pilot01-node2 crmd: [2326]: info:
process_lrm_event: LRM operation drbd_pilot0:1_notify_0 (call=17, rc=0,
cib-update=26, confirmed=true) ok

Sep 30 11:29:26 s_all at pilot01-node2 lrmd: [2323]: info:
rsc:drbd_pilot0:1:18: notify

Sep 30 11:29:26 s_all at pilot01-node2 crmd: [2326]: info: do_lrm_rsc_op:
Performing key=79:448:0:106b9e8c-1ea2-475f-b2c9-ddb3088ea7aa
op=drbd_pilot0:1_notify_0 )

Sep 30 11:29:26 s_all at pilot01-node2 crmd: [2326]: info:
process_lrm_event: LRM operation drbd_pilot0:1_notify_0 (call=18, rc=0,
cib-update=27, confirmed=true) ok

Sep 30 11:29:28 s_all at pilot01-node2 lrmd: [2323]: info:
rsc:drbd_pilot0:1:19: notify

Sep 30 11:29:28 s_all at pilot01-node2 crmd: [2326]: info: do_lrm_rsc_op:
Performing key=79:451:0:106b9e8c-1ea2-475f-b2c9-ddb3088ea7aa
op=drbd_pilot0:1_notify_0 )

Sep 30 11:29:28 s_all at pilot01-node2 crmd: [2326]: info:
process_lrm_event: LRM operation drbd_pilot0:1_notify_0 (call=19, rc=0,
cib-update=28, confirmed=true) ok

Sep 30 11:29:29 s_all at pilot01-node2 lrmd: [2323]: info:
rsc:drbd_pilot0:1:20: notify

Sep 30 11:29:29 s_all at pilot01-node2 crmd: [2326]: info: do_lrm_rsc_op:
Performing key=79:454:0:106b9e8c-1ea2-475f-b2c9-ddb3088ea7aa
op=drbd_pilot0:1_notify_0 )

Sep 30 11:29:29 s_all at pilot01-node2 crmd: [2326]: info:
process_lrm_event: LRM operation drbd_pilot0:1_notify_0 (call=20, rc=0,
cib-update=29, confirmed=true) ok

Sep 30 11:29:36 s_all at pilot01-node2 lrmd: [2323]: info: RA output:
(drbd_pilot0:1:monitor:stderr) Warning: resource pilot0 last used config
file: /usr/local/etc/drbd.conf   current config file: /etc/drbd.conf

 

Kind Regards,

Sebastian

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100930/c350ace3/attachment.htm>


More information about the drbd-user mailing list