[DRBD-user] Diagnosing a Failed Resource

Eric epretorious at yahoo.com
Mon Jan 21 22:48:39 CET 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Dan:

I attempted to re-connect r1 and then captured the results in the kernel ring buffer:

[ 7038.868118] d-con r1: conn( StandAlone -> Unconnected ) 
[ 7038.868170] d-con r1: Starting receiver thread (from drbd_w_r1 [5058])
[ 7038.868238] d-con r1: receiver (re)started
[ 7038.868269] d-con r1: conn( Unconnected -> WFConnection ) 
[ 7039.367612] d-con r1: Handshake successful: Agreed network protocol version 100
[ 7039.367818] d-con r1: conn( WFConnection -> WFReportParams ) 
[ 7039.367851] d-con r1: Starting asender thread (from drbd_r_r1 [20387])
[ 7039.376119] block drbd1000: drbd_sync_handshake:
[ 7039.376127] block drbd1000: self E624A5F197121810:701073FA8F926E0E:B8DFF1CE13CF5A28:B8DEF1CE13CF5A29 bits:0 flags:0
[ 7039.376133] block drbd1000: peer 221AECCC2A594D3B:701073FA8F926E0E:B8DFF1CE13CF5A29:B8DEF1CE13CF5A29 bits:0 flags:0
[ 7039.376139] block drbd1000: uuid_compare()=100 by rule 90
[ 7039.376146] block drbd1000: helper command: /sbin/drbdadm initial-split-brain minor-1000
[ 7039.378519] block drbd1000: helper command: /sbin/drbdadm initial-split-brain minor-1000 exit code 0 (0x0)
[ 7039.378544] block drbd1000: Split-Brain detected but unresolved, dropping connection!
[ 7039.378551] block drbd1000: helper command: /sbin/drbdadm split-brain minor-1000
[ 7039.381167] block drbd1000: helper command: /sbin/drbdadm split-brain minor-1000 exit code 0 (0x0)
[ 7039.381228] d-con r1: conn( WFReportParams -> Disconnecting ) 
[ 7039.381237] d-con r1: error receiving ReportState, e: -5 l: 0!
[ 7039.381274] d-con r1: asender terminated
[ 7039.381283] d-con r1: Terminating asender thread
[ 7039.381539] d-con r1: Connection closed
[ 7039.381567] d-con r1: conn( Disconnecting -> StandAlone ) 
[ 7039.381573] d-con r1: receiver terminated
[ 7039.381577] d-con r1: Terminating receiver thread


I have no idea what it means, however.

Eric Pretorious
Truckee, CA




>________________________________
> From: Dan Barker <dbarker at visioncomm.net>
>To: "drbd-user at lists.linbit.com" <drbd-user at lists.linbit.com> 
>Sent: Monday, January 21, 2013 6:40 AM
>Subject: Re: [DRBD-user] Diagnosing a Failed Resource
> 
>
> 
>The errors in connecting are logged. If you can’t find them, attempt to connect a resource (drbdadm connect r1, for example) to create the errors again, and then look at the logs for the reason the connection was not established. The “status” will continue to show waiting for connection (WFC) but there will be a reason in the log files. If the logs are unclear, post the relevant portions back here and we’ll help.
> 
>Something like ‘dmesg | grep drbd’. You may want to do the logs on both drbd servers. You can do the connect command on either.
> 
>hth
> 
>Dan
> 
>From:drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Eric
>Sent: Monday, January 21, 2013 1:24 AM
>To: drbd-user at lists.linbit.com
>Subject: [DRBD-user] Diagnosing a Failed Resource
> 
>I've configured corosync+pacemaker to managee a simple two-resource DRBD cluster:
> 
>> san1:~ # crm configure show | cat -
>> node san1 \
>>     attributes standby="off"
>> node san2 \
>>     attributes standby="off"
>> primitive p_DRBD-r0 ocf:linbit:drbd \
>>     params drbd_resource="r0" \
>>     op monitor interval="60s"
>> primitive p_DRBD-r1 ocf:linbit:drbd \
>>     params drbd_resource="r1" \
>>     op monitor interval="60s"
>> primitive p_IP-1_253 ocf:heartbeat:IPaddr2 \
>>     params ip="192.168.1.253" cidr_netmask="24" \
>>     op monitor interval="30s"
>> primitive p_IP-1_254 ocf:heartbeat:IPaddr2 \
>>     params ip="192.168.1.254" cidr_netmask="24" \
>>     op monitor interval="30s"
>> primitive p_iSCSI-san1 ocf:heartbeat:iSCSITarget \
>>     params iqn="iqn.2012-11.com.example.san1:sda" \
>>     op monitor interval="10s"
>> primitive p_iSCSI-san1_0 ocf:heartbeat:iSCSILogicalUnit \
>>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="0" path="/dev/drbd0" \
>>     op monitor interval="10s"
>> primitive p_iSCSI-san1_1 ocf:heartbeat:iSCSILogicalUnit \
>>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="1" path="/dev/drbd1" \
>>     op monitor interval="10s"
>> primitive p_iSCSI-san1_2 ocf:heartbeat:iSCSILogicalUnit \
>>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="2" path="/dev/drbd2" \
>>     op monitor interval="10s"
>> primitive p_iSCSI-san1_3 ocf:heartbeat:iSCSILogicalUnit \
>>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="3" path="/dev/drbd3" \
>>     op monitor interval="10s"
>> primitive p_iSCSI-san2 ocf:heartbeat:iSCSITarget \
>>     params iqn="iqn.2012-11.com.example.san2:sda" \
>>     op monitor interval="10s"
>> primitive p_iSCSI-san2_0 ocf:heartbeat:iSCSILogicalUnit \
>>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="0" path="/dev/drbd1000" \
>>     op monitor interval="10s"
>> primitive p_iSCSI-san2_1 ocf:heartbeat:iSCSILogicalUnit \
>>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="1" path="/dev/drbd1001" \
>>     op monitor interval="10s"
>> primitive p_iSCSI-san2_2 ocf:heartbeat:iSCSILogicalUnit \
>>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="2" path="/dev/drbd1002" \
>>     op monitor interval="10s"
>> primitive p_iSCSI-san2_3 ocf:heartbeat:iSCSILogicalUnit \
>>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="3" path="/dev/drbd1003" \
>>     op monitor interval="10s"
>> group g_iSCSI-san1 p_iSCSI-san1 p_iSCSI-san1_0 p_iSCSI-san1_1 p_iSCSI-san1_2 p_iSCSI-san1_3 p_IP-1_254
>> group g_iSCSI-san2 p_iSCSI-san2 p_iSCSI-san2_0 p_iSCSI-san2_1 p_iSCSI-san2_2 p_iSCSI-san2_3 p_IP-1_253
>> ms ms_DRBD-r0 p_DRBD-r0 \
>>     meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
>> ms ms_DRBD-r1 p_DRBD-r1 \
>>     meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
>> location l_iSCSI-san1_and_DRBD-r0 p_IP-1_254 10240: san1
>> location l_iSCSI-san2_and_DRBD-r1 p_IP-1_253 10240: san2
>> colocation c_iSCSI_with_DRBD-r0 inf: g_iSCSI-san1 ms_DRBD-r0:Master
>> colocation c_iSCSI_with_DRBD-r1 inf: g_iSCSI-san2 ms_DRBD-r1:Master
>> order o_DRBD-r0_before_iSCSI-san1 inf: ms_DRBD-r0:promote g_iSCSI-san1:start
>> order o_DRBD-r1_before_iSCSI-san2 inf: ms_DRBD-r1:promote g_iSCSI-san2:start
>> property $id="cib-bootstrap-options" \
>>     dc-version="1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf" \
>>     cluster-infrastructure="openais" \
>>     expected-quorum-votes="2" \
>>     stonith-enabled="false" \
>>     no-quorum-policy="ignore"
> 
>The cluster appears to be functioning correctly:
> 
>> san1:~ # crm_mon -1
>> ============
>> Last updated: Sun Jan 20 22:20:17 2013
>> Last change: Sun Jan 20 21:59:15 2013 by root via crm_attribute on san1
>> Stack: openais
>> Current DC: san1 - partition with quorum
>> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
>> 2 Nodes configured, 2 expected votes
>> 16 Resources configured.
>> ============
>> 
>> Online: [ san1 san2 ]
>> 
>>  Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>>      Masters: [ san1 ]
>>      Slaves: [ san2 ]
>>  Resource Group: g_iSCSI-san1
>>      p_iSCSI-san1    (ocf::heartbeat:iSCSITarget):    Started san1
>>      p_iSCSI-san1_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>>      p_iSCSI-san1_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>>      p_iSCSI-san1_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>>      p_iSCSI-san1_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>>      p_IP-1_254    (ocf::heartbeat:IPaddr2):    Started san1
>>  Master/Slave Set: ms_DRBD-r1 [p_DRBD-r1]
>>      Masters: [ san2 ]
>>      Slaves: [ san1 ]
>>  Resource Group: g_iSCSI-san2
>>      p_iSCSI-san2    (ocf::heartbeat:iSCSITarget):    Started san2
>>      p_iSCSI-san2_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>      p_iSCSI-san2_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>      p_iSCSI-san2_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>      p_iSCSI-san2_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>      p_IP-1_253    (ocf::heartbeat:IPaddr2):    Started san2
>
>> san2:~ # crm_mon -1
>> ============
>> Last updated: Sun Jan 20 22:20:17 2013
>> Last change: Sun Jan 20 21:59:15 2013 by root via crm_attribute on san1
>> Stack: openais
>> Current DC: san1 - partition with quorum
>> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
>> 2 Nodes configured, 2 expected votes
>> 16 Resources configured.
>> ============
>> 
>> Online: [ san1 san2 ]
>> 
>>  Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>>      Masters: [ san1 ]
>>      Slaves: [ san2 ]
>>  Resource Group: g_iSCSI-san1
>>      p_iSCSI-san1    (ocf::heartbeat:iSCSITarget):    Started san1
>>      p_iSCSI-san1_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>>      p_iSCSI-san1_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>>      p_iSCSI-san1_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>>      p_iSCSI-san1_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>>      p_IP-1_254    (ocf::heartbeat:IPaddr2):    Started san1
>>  Master/Slave Set: ms_DRBD-r1 [p_DRBD-r1]
>>      Masters: [ san2 ]
>>      Slaves: [ san1 ]
>>  Resource Group: g_iSCSI-san2
>>      p_iSCSI-san2    (ocf::heartbeat:iSCSITarget):    Started san2
>>      p_iSCSI-san2_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>      p_iSCSI-san2_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>      p_iSCSI-san2_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>      p_iSCSI-san2_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>      p_IP-1_253    (ocf::heartbeat:IPaddr2):    Started san2
>However, the two DRBD resources do not appear to be communicating:
> 
>> san1:~ # cat /proc/drbd 
>> version: 8.4.1 (api:1/proto:86-100)
>> GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at fat-tyre, 2011-12-20 12:43:15
>>  0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:3259080
>>  1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>>  2: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>>  3: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>> 
>> 1000: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>> 1001: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>> 1002: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>> 1003: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 
>> san2:~ # cat /proc/drbd 
>> version: 8.4.1 (api:1/proto:86-100)
>> GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at fat-tyre, 2011-12-20 12:43:15
>>  0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:140
>>  1: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>>  2: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>>  3: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>> 
>> 1000: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>> 1001: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>> 1002: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>> 1003: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>How can I begin to troubleshoot this error?
> 
>Eric Pretorious
>Truckee, cA
>_______________________________________________
>drbd-user mailing list
>drbd-user at lists.linbit.com
>http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130121/c1420b2c/attachment.htm>


More information about the drbd-user mailing list