[DRBD-user] Diagnosing a Failed Resource

Dan Barker dbarker at visioncomm.net
Mon Jan 21 15:40:25 CET 2013


The errors in connecting are logged. If you can't find them, attempt to connect a resource (drbdadm connect r1, for example) to create the errors again, and then look at the logs for the reason the connection was not established. The "status" will continue to show waiting for connection (WFC) but there will be a reason in the log files. If the logs are unclear, post the relevant portions back here and we'll help.

Something like 'dmesg | grep drbd'. You may want to do the logs on both drbd servers. You can do the connect command on either.

hth

Dan

From: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Eric
Sent: Monday, January 21, 2013 1:24 AM
To: drbd-user at lists.linbit.com
Subject: [DRBD-user] Diagnosing a Failed Resource

I've configured corosync+pacemaker to managee a simple two-resource DRBD cluster:

> san1:~ # crm configure show | cat -
> node san1 \
>     attributes standby="off"
> node san2 \
>     attributes standby="off"
> primitive p_DRBD-r0 ocf:linbit:drbd \
>     params drbd_resource="r0" \
>     op monitor interval="60s"
> primitive p_DRBD-r1 ocf:linbit:drbd \
>     params drbd_resource="r1" \
>     op monitor interval="60s"
> primitive p_IP-1_253 ocf:heartbeat:IPaddr2 \
>     params ip="192.168.1.253" cidr_netmask="24" \
>     op monitor interval="30s"
> primitive p_IP-1_254 ocf:heartbeat:IPaddr2 \
>     params ip="192.168.1.254" cidr_netmask="24" \
>     op monitor interval="30s"
> primitive p_iSCSI-san1 ocf:heartbeat:iSCSITarget \
>     params iqn="iqn.2012-11.com.example.san1:sda" \
>     op monitor interval="10s"
> primitive p_iSCSI-san1_0 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="0" path="/dev/drbd0" \
>     op monitor interval="10s"
> primitive p_iSCSI-san1_1 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="1" path="/dev/drbd1" \
>     op monitor interval="10s"
> primitive p_iSCSI-san1_2 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="2" path="/dev/drbd2" \
>     op monitor interval="10s"
> primitive p_iSCSI-san1_3 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="3" path="/dev/drbd3" \
>     op monitor interval="10s"
> primitive p_iSCSI-san2 ocf:heartbeat:iSCSITarget \
>     params iqn="iqn.2012-11.com.example.san2:sda" \
>     op monitor interval="10s"
> primitive p_iSCSI-san2_0 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="0" path="/dev/drbd1000" \
>     op monitor interval="10s"
> primitive p_iSCSI-san2_1 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="1" path="/dev/drbd1001" \
>     op monitor interval="10s"
> primitive p_iSCSI-san2_2 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="2" path="/dev/drbd1002" \
>     op monitor interval="10s"
> primitive p_iSCSI-san2_3 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="3" path="/dev/drbd1003" \
>     op monitor interval="10s"
> group g_iSCSI-san1 p_iSCSI-san1 p_iSCSI-san1_0 p_iSCSI-san1_1 p_iSCSI-san1_2 p_iSCSI-san1_3 p_IP-1_254
> group g_iSCSI-san2 p_iSCSI-san2 p_iSCSI-san2_0 p_iSCSI-san2_1 p_iSCSI-san2_2 p_iSCSI-san2_3 p_IP-1_253
> ms ms_DRBD-r0 p_DRBD-r0 \
>     meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
> ms ms_DRBD-r1 p_DRBD-r1 \
>     meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
> location l_iSCSI-san1_and_DRBD-r0 p_IP-1_254 10240: san1
> location l_iSCSI-san2_and_DRBD-r1 p_IP-1_253 10240: san2
> colocation c_iSCSI_with_DRBD-r0 inf: g_iSCSI-san1 ms_DRBD-r0:Master
> colocation c_iSCSI_with_DRBD-r1 inf: g_iSCSI-san2 ms_DRBD-r1:Master
> order o_DRBD-r0_before_iSCSI-san1 inf: ms_DRBD-r0:promote g_iSCSI-san1:start
> order o_DRBD-r1_before_iSCSI-san2 inf: ms_DRBD-r1:promote g_iSCSI-san2:start
> property $id="cib-bootstrap-options" \
>     dc-version="1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf" \
>     cluster-infrastructure="openais" \
>     expected-quorum-votes="2" \
>     stonith-enabled="false" \
>     no-quorum-policy="ignore"

The cluster appears to be functioning correctly:

> san1:~ # crm_mon -1
> ============
> Last updated: Sun Jan 20 22:20:17 2013
> Last change: Sun Jan 20 21:59:15 2013 by root via crm_attribute on san1
> Stack: openais
> Current DC: san1 - partition with quorum
> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
> 2 Nodes configured, 2 expected votes
> 16 Resources configured.
> ============
>
> Online: [ san1 san2 ]
>
>  Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>      Masters: [ san1 ]
>      Slaves: [ san2 ]
>  Resource Group: g_iSCSI-san1
>      p_iSCSI-san1    (ocf::heartbeat:iSCSITarget):    Started san1
>      p_iSCSI-san1_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_IP-1_254    (ocf::heartbeat:IPaddr2):    Started san1
>  Master/Slave Set: ms_DRBD-r1 [p_DRBD-r1]
>      Masters: [ san2 ]
>      Slaves: [ san1 ]
>  Resource Group: g_iSCSI-san2
>      p_iSCSI-san2    (ocf::heartbeat:iSCSITarget):    Started san2
>      p_iSCSI-san2_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_IP-1_253    (ocf::heartbeat:IPaddr2):    Started san2

> san2:~ # crm_mon -1
> ============
> Last updated: Sun Jan 20 22:20:17 2013
> Last change: Sun Jan 20 21:59:15 2013 by root via crm_attribute on san1
> Stack: openais
> Current DC: san1 - partition with quorum
> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
> 2 Nodes configured, 2 expected votes
> 16 Resources configured.
> ============
>
> Online: [ san1 san2 ]
>
>  Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>      Masters: [ san1 ]
>      Slaves: [ san2 ]
>  Resource Group: g_iSCSI-san1
>      p_iSCSI-san1    (ocf::heartbeat:iSCSITarget):    Started san1
>      p_iSCSI-san1_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_IP-1_254    (ocf::heartbeat:IPaddr2):    Started san1
>  Master/Slave Set: ms_DRBD-r1 [p_DRBD-r1]
>      Masters: [ san2 ]
>      Slaves: [ san1 ]
>  Resource Group: g_iSCSI-san2
>      p_iSCSI-san2    (ocf::heartbeat:iSCSITarget):    Started san2
>      p_iSCSI-san2_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_IP-1_253    (ocf::heartbeat:IPaddr2):    Started san2
However, the two DRBD resources do not appear to be communicating:

> san1:~ # cat /proc/drbd
> version: 8.4.1 (api:1/proto:86-100)
> GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at fat-tyre, 2011-12-20 12:43:15
>  0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:3259080
>  1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>  2: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>  3: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>
> 1000: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1001: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1002: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1003: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

> san2:~ # cat /proc/drbd
> version: 8.4.1 (api:1/proto:86-100)
> GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at fat-tyre, 2011-12-20 12:43:15
>  0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:140
>  1: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>  2: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>  3: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>
> 1000: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1001: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1002: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1003: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
How can I begin to troubleshoot this error?

Eric Pretorious
Truckee, cA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130121/1c52cfda/attachment.htm>


More information about the drbd-user mailing list