Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi everybody, Today I had the problem that after a reboot, a node wouldn't come back into Connected State. It was always like WFConnection or Disconnected and so on. The secondary node did not reconnect and so it wasn't syncing. I thought I need to recreate the device and do a manual split-brain recovery. Nothing worked. The DRBD stayed outdated respectively inconsistent. I was able resolve the issue and hopefully the following explaination is correct (did it of my memories) and does help some other admins which sturggled with this issue for days. Some System Info (Debian Stable with Backport packages): --- Cluster Config & Status Dump -- Created: Do 30. Sep 13:25:21 CEST 2010 on pilot01-node1 by uid=0(root) gid=0(root) Gruppen=0(root) Systeminfo: Linux pilot01-node1 2.6.28-1-amd64 #1 SMP Wed Feb 18 17:16:12 UTC 2009 x86_64 GNU/Linux ##################### ### 1. DRBD State ### ##################### drbd driver loaded OK; device status: version: 8.3.7 (api:88/proto:86-91) GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root at prolog01-pilot1, 2010-06-07 17:34:47 m:res cs ro ds p mounted fstype 0:pilot0 Connected Primary/Secondary UpToDate/UpToDate C /mnt/cluster xfs ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------ That was the initial drbd state of the secondary node: root at pilot01-node2:/home/nwadmin# cat /proc/drbd version: 8.3.7 (api:88/proto:86-91) GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root at prolog01-pilot1, 2010-06-07 17:34:47 0: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C r---- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:1951768 ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------ Resolution: Then I looked at the logs (maybe a little too late) and saw that there were erros concerning the drbd.conf. The ocf:linbit:drbd uses /etc/drbd.conf as the OCF_RESKEY_drbdconf and my drbdadm tool always wanted to use /usr/local/etc/drbd.conf (maybe this is compiled into the drb-utils, I wasn't able to figure that out) therefor the pacemaker always refused to let the secondary node connect to the drbd device. What I did to resolve it was: 1. Change my resource to something like this: primitive drbd_pilot0 ocf:linbit:drbd \ params drbd_resource="pilot0" drbdconf="/usr/local/etc/drbd.conf" \ operations $id="drbd_pilot0-operations" \ op monitor interval="15s" 2. Cleaned up all the erros on the resource: Crm resource cleanup ms_drbd_pilot0 ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------ Here is the State of the Syncing: root at pilot01-node1:/home/nwadmin# crm resource cleanup res_MySQL Cleaning up res_MySQL on pilot01-node1 Cleaning up res_MySQL on pilot01-node2 root at pilot01-node1:/home/nwadmin# cat /proc/drbd version: 8.3.7 (api:88/proto:86-91) GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root at prolog01-pilot1, 2010-06-07 17:34:47 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r---- ns:879617 nr:0 dw:2121 dr:890151 al:5 bm:53 lo:1 pe:39 ua:189 ap:0 ep:1 wo:b oos:1073972 [========>...........] sync'ed: 45.1% (1073972/1951768)K finish: 0:00:22 speed: 47,920 (48,764) K/sec ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------ Here is a relevant log output: crmd: [2326]: info: do_lrm_rsc_op: Performing key=81:445:0:106b9e8c-1ea2-475f-b2c9-ddb3088ea7aa op=drbd_pilot0:1_notify_0 ) Sep 30 11:29:24 s_all at pilot01-node2 lrmd: [2323]: info: RA output: (drbd_pilot0:1:notify:stderr) Warning: resource pilot0 last used config file: /etc/drbd.conf current config file: /usr/local/etc/drbd.conf Sep 30 11:29:24 s_all at pilot01-node2 lrmd: [2323]: info: RA output: (drbd_pilot0:1:notify:stderr) /usr/lib/ocf/resource.d//linbit/drbd: line 762: [: too many arguments Sep 30 11:29:24 s_all at pilot01-node2 crmd: [2326]: info: process_lrm_event: LRM operation drbd_pilot0:1_notify_0 (call=17, rc=0, cib-update=26, confirmed=true) ok Sep 30 11:29:26 s_all at pilot01-node2 lrmd: [2323]: info: rsc:drbd_pilot0:1:18: notify Sep 30 11:29:26 s_all at pilot01-node2 crmd: [2326]: info: do_lrm_rsc_op: Performing key=79:448:0:106b9e8c-1ea2-475f-b2c9-ddb3088ea7aa op=drbd_pilot0:1_notify_0 ) Sep 30 11:29:26 s_all at pilot01-node2 crmd: [2326]: info: process_lrm_event: LRM operation drbd_pilot0:1_notify_0 (call=18, rc=0, cib-update=27, confirmed=true) ok Sep 30 11:29:28 s_all at pilot01-node2 lrmd: [2323]: info: rsc:drbd_pilot0:1:19: notify Sep 30 11:29:28 s_all at pilot01-node2 crmd: [2326]: info: do_lrm_rsc_op: Performing key=79:451:0:106b9e8c-1ea2-475f-b2c9-ddb3088ea7aa op=drbd_pilot0:1_notify_0 ) Sep 30 11:29:28 s_all at pilot01-node2 crmd: [2326]: info: process_lrm_event: LRM operation drbd_pilot0:1_notify_0 (call=19, rc=0, cib-update=28, confirmed=true) ok Sep 30 11:29:29 s_all at pilot01-node2 lrmd: [2323]: info: rsc:drbd_pilot0:1:20: notify Sep 30 11:29:29 s_all at pilot01-node2 crmd: [2326]: info: do_lrm_rsc_op: Performing key=79:454:0:106b9e8c-1ea2-475f-b2c9-ddb3088ea7aa op=drbd_pilot0:1_notify_0 ) Sep 30 11:29:29 s_all at pilot01-node2 crmd: [2326]: info: process_lrm_event: LRM operation drbd_pilot0:1_notify_0 (call=20, rc=0, cib-update=29, confirmed=true) ok Sep 30 11:29:36 s_all at pilot01-node2 lrmd: [2323]: info: RA output: (drbd_pilot0:1:monitor:stderr) Warning: resource pilot0 last used config file: /usr/local/etc/drbd.conf current config file: /etc/drbd.conf Kind Regards, Sebastian -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100930/c350ace3/attachment.htm>