[DRBD-user] Diagnosing a Failed Resource

Eric epretorious at yahoo.com
Mon Jan 21 07:24:15 CET 2013


I've configured corosync+pacemaker to managee a simple two-resource DRBD cluster:

> san1:~ # crm configure show | cat -
> node san1 \
>     attributes standby="off"
> node san2 \
>     attributes standby="off"
> primitive p_DRBD-r0 ocf:linbit:drbd \
>     params drbd_resource="r0" \
>     op monitor interval="60s"
> primitive p_DRBD-r1 ocf:linbit:drbd \
>     params drbd_resource="r1" \
>     op monitor interval="60s"
> primitive p_IP-1_253 ocf:heartbeat:IPaddr2 \
>     params ip="192.168.1.253" cidr_netmask="24" \
>     op monitor interval="30s"
> primitive p_IP-1_254 ocf:heartbeat:IPaddr2 \
>     params ip="192.168.1.254" cidr_netmask="24" \
>     op monitor interval="30s"
> primitive p_iSCSI-san1 ocf:heartbeat:iSCSITarget \
>     params iqn="iqn.2012-11.com.example.san1:sda" \
>     op monitor interval="10s"
> primitive p_iSCSI-san1_0 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="0" path="/dev/drbd0" \
>     op monitor interval="10s"
> primitive p_iSCSI-san1_1 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="1" path="/dev/drbd1" \
>     op monitor interval="10s"
> primitive p_iSCSI-san1_2 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="2" path="/dev/drbd2" \
>     op monitor interval="10s"
> primitive p_iSCSI-san1_3 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="3" path="/dev/drbd3" \
>     op monitor interval="10s"
> primitive p_iSCSI-san2 ocf:heartbeat:iSCSITarget \
>     params iqn="iqn.2012-11.com.example.san2:sda" \
>     op monitor interval="10s"
> primitive p_iSCSI-san2_0 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="0" path="/dev/drbd1000" \
>     op monitor interval="10s"
> primitive p_iSCSI-san2_1 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="1" path="/dev/drbd1001" \
>     op monitor interval="10s"
> primitive p_iSCSI-san2_2 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="2" path="/dev/drbd1002" \
>     op monitor interval="10s"
> primitive p_iSCSI-san2_3 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="3" path="/dev/drbd1003" \
>     op monitor interval="10s"
> group g_iSCSI-san1 p_iSCSI-san1 p_iSCSI-san1_0 p_iSCSI-san1_1 p_iSCSI-san1_2 p_iSCSI-san1_3 p_IP-1_254
> group g_iSCSI-san2 p_iSCSI-san2 p_iSCSI-san2_0 p_iSCSI-san2_1 p_iSCSI-san2_2 p_iSCSI-san2_3 p_IP-1_253
> ms ms_DRBD-r0 p_DRBD-r0 \
>     meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
> ms ms_DRBD-r1 p_DRBD-r1 \
>     meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
> location l_iSCSI-san1_and_DRBD-r0 p_IP-1_254 10240: san1
> location l_iSCSI-san2_and_DRBD-r1 p_IP-1_253 10240: san2
> colocation c_iSCSI_with_DRBD-r0 inf: g_iSCSI-san1 ms_DRBD-r0:Master
> colocation c_iSCSI_with_DRBD-r1 inf: g_iSCSI-san2 ms_DRBD-r1:Master
> order o_DRBD-r0_before_iSCSI-san1 inf: ms_DRBD-r0:promote g_iSCSI-san1:start
> order o_DRBD-r1_before_iSCSI-san2 inf: ms_DRBD-r1:promote g_iSCSI-san2:start
> property $id="cib-bootstrap-options" \
>     dc-version="1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf" \
>     cluster-infrastructure="openais" \
>     expected-quorum-votes="2" \
>     stonith-enabled="false" \
>     no-quorum-policy="ignore"


The cluster appears to be functioning correctly:


> san1:~ # crm_mon -1
> ============
> Last updated: Sun Jan 20 22:20:17 2013
> Last change: Sun Jan 20 21:59:15 2013 by root via crm_attribute on san1
> Stack: openais
> Current DC: san1 - partition with quorum
> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
> 2 Nodes configured, 2 expected votes
> 16 Resources configured.
> ============
> 
> Online: [ san1 san2 ]
> 
>  Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>      Masters: [ san1 ]
>      Slaves: [ san2 ]
>  Resource Group: g_iSCSI-san1
>      p_iSCSI-san1    (ocf::heartbeat:iSCSITarget):    Started san1
>      p_iSCSI-san1_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_IP-1_254    (ocf::heartbeat:IPaddr2):    Started san1
>  Master/Slave Set: ms_DRBD-r1 [p_DRBD-r1]
>      Masters: [ san2 ]
>      Slaves: [ san1 ]
>  Resource Group: g_iSCSI-san2
>      p_iSCSI-san2    (ocf::heartbeat:iSCSITarget):    Started san2
>      p_iSCSI-san2_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_IP-1_253    (ocf::heartbeat:IPaddr2):    Started san2

> san2:~ # crm_mon -1
> ============
> Last updated: Sun Jan 20 22:20:17 2013
> Last change: Sun Jan 20 21:59:15 2013 by root via crm_attribute on san1
> Stack: openais
> Current DC: san1 - partition with quorum
> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
> 2 Nodes configured, 2 expected votes
> 16 Resources configured.
> ============
> 
> Online: [ san1 san2 ]
> 
>  Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>      Masters: [ san1 ]
>      Slaves: [ san2 ]
>  Resource Group: g_iSCSI-san1
>      p_iSCSI-san1    (ocf::heartbeat:iSCSITarget):    Started san1
>      p_iSCSI-san1_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_IP-1_254    (ocf::heartbeat:IPaddr2):    Started san1
>  Master/Slave Set: ms_DRBD-r1 [p_DRBD-r1]
>      Masters: [ san2 ]
>      Slaves: [ san1 ]
>  Resource Group: g_iSCSI-san2
>      p_iSCSI-san2    (ocf::heartbeat:iSCSITarget):    Started san2
>      p_iSCSI-san2_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_IP-1_253    (ocf::heartbeat:IPaddr2):    Started san2


However, the two DRBD resources do not appear to be communicating:

> san1:~ # cat /proc/drbd 
> version: 8.4.1 (api:1/proto:86-100)
> GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at fat-tyre, 2011-12-20 12:43:15
>  0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:3259080
>  1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>  2: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>  3: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 
> 1000: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1001: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1002: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1003: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

> san2:~ # cat /proc/drbd 
> version: 8.4.1 (api:1/proto:86-100)
> GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at fat-tyre, 2011-12-20 12:43:15
>  0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:140
>  1: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>  2: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>  3: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 
> 1000: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1001: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1002: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1003: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0


How can I begin to troubleshoot this error?

Eric Pretorious
Truckee, cA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130120/4e324625/attachment.htm>


More information about the drbd-user mailing list