[DRBD-user] Diagnosing a Failed Resource

Tue Jan 22 15:31:35 CET 2013

>>> However, I still have no idea what caused the failures.

A split brain is caused by writing to both members while they are disconnected. What in your environment caused that to occur is probably lost in logs a week gone. But, if your procedures always allow only one node (primary) to write to a resource, even if it’s disconnected, then split-brain won’t occur.

“nuke the whole thing” certainly worked. So would have following the doc to invalidate the secondary copy and then simply connect. There is an excellent chapter in the manual about split-brain.

Dan

From: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Eric
Sent: Monday, January 21, 2013 5:08 PM
To: drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] Diagnosing a Failed Resource

I decided to nuke the whole thing and start over:

On both nodes, I...

1. disabled the cluster
   sanX:~ # crm node standby
2. re-create the md
   sanX:~ # drbdadm create-md r0
   sanX:~ # drbdadm create-md r1
3. bring the DRBD resources online
   sanX:~ # drbdadm up r0
   sanX:~ # drbdadm up r1

And then, on the node intended to act as the DRBD resource primary, I...

4. designated the node as primary [and sync'd the two nodes' resources]
   san1:~ # drbdadm primary r0 --force
   san2:~ # drbdadm primary r1 --force

After the resources completed sync'ing I...

5. took the DRBD resources offline
   sanX:~ # drbdadm down r0
   sanX:~ # drbdadm down r1
6. restarted the cluster
   sanX:~ # crm node online
7. verified that DRBD was functioning correctly
   sanX:~ # cat /proc/drbd
   version: 8.4.1 (api:1/proto:86-100)
   GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at fat-tyre, 2011-12-20 12:43:15
    0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
       ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
    1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
       ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
    2: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
       ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
    3: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
       ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

   1000: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
       ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
   1001: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
       ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
   1002: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
       ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
   1003: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
       ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
8. Checked the kernel ring buffer...

[13107.929867] d-con r0: Starting worker thread (from drbdsetup [27297])
[13107.930025] block drbd0: disk( Diskless -> Attaching )
[13107.935720] d-con r0: Method to ensure write ordering: flush
[13107.935728] block drbd0: max BIO size = 1048576
[13107.935736] block drbd0: drbd_bm_resize called with capacity == 104868976
[13107.936667] block drbd0: resync bitmap: bits=13108622 words=204823 pages=401
[13107.936678] block drbd0: size = 50 GB (52434488 KB)
[13107.961687] block drbd0: bitmap READ of 401 pages took 6 jiffies
[13107.964602] block drbd0: recounting of set bits took additional 1 jiffies
[13107.964608] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[13107.964619] block drbd0: disk( Attaching -> UpToDate )
[13107.964624] block drbd0: attached to UUIDs 5D13C510E7C43556:0000000000000000:CC7A830506E5F9FA:0000000000000004
[13107.970870] d-con r1: Starting worker thread (from drbdsetup [27300])
[13107.973936] block drbd1000: disk( Diskless -> Attaching )
[13107.982866] d-con r1: Method to ensure write ordering: flush
[13107.982874] block drbd1000: max BIO size = 1048576
[13107.982881] block drbd1000: drbd_bm_resize called with capacity == 104868976
[13107.983829] block drbd1000: resync bitmap: bits=13108622 words=204823 pages=401
[13107.983841] block drbd1000: size = 50 GB (52434488 KB)
[13108.017395] block drbd1000: bitmap READ of 401 pages took 9 jiffies
[13108.019262] block drbd1000: recounting of set bits took additional 0 jiffies
[13108.019268] block drbd1000: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[13108.019280] block drbd1000: disk( Attaching -> UpToDate )
[13108.019285] block drbd1000: attached to UUIDs 9162525E46599CBB:1B6D97817803DDE5:531ACCA34B5A8A81:0000000000000004
[13108.044402] block drbd1: disk( Diskless -> Attaching )
[13108.044863] block drbd1: max BIO size = 1048576
[13108.044872] block drbd1: drbd_bm_resize called with capacity == 104868976
[13108.045862] block drbd1: resync bitmap: bits=13108622 words=204823 pages=401
[13108.045870] block drbd1: size = 50 GB (52434488 KB)
[13108.078330] block drbd1: bitmap READ of 401 pages took 8 jiffies
[13108.080379] block drbd1: recounting of set bits took additional 1 jiffies
[13108.080385] block drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[13108.080396] block drbd1: disk( Attaching -> UpToDate )
[13108.080401] block drbd1: attached to UUIDs 3DD43846444979D0:0000000000000000:C2E4C547C3069E48:0000000000000004
[13108.153218] block drbd1001: disk( Diskless -> Attaching )
[13108.153641] block drbd1001: max BIO size = 1048576
[13108.153653] block drbd1001: drbd_bm_resize called with capacity == 104868976
[13108.154769] block drbd1001: resync bitmap: bits=13108622 words=204823 pages=401
[13108.154781] block drbd1001: size = 50 GB (52434488 KB)
[13108.202463] block drbd1001: bitmap READ of 401 pages took 12 jiffies
[13108.204504] block drbd1001: recounting of set bits took additional 1 jiffies
[13108.204510] block drbd1001: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[13108.204521] block drbd1001: disk( Attaching -> UpToDate )
[13108.204526] block drbd1001: attached to UUIDs 77EFDEC15B06F199:DF7193C3448F470B:75319BD97ED29A6A:0000000000000004
[13108.209437] block drbd2: disk( Diskless -> Attaching )
[13108.209869] block drbd2: max BIO size = 1048576
[13108.209879] block drbd2: drbd_bm_resize called with capacity == 52450480
[13108.210355] block drbd2: resync bitmap: bits=6556310 words=102443 pages=201
[13108.210364] block drbd2: size = 25 GB (26225240 KB)
[13108.248940] block drbd2: bitmap READ of 201 pages took 10 jiffies
[13108.249882] block drbd2: recounting of set bits took additional 0 jiffies
[13108.249887] block drbd2: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[13108.249898] block drbd2: disk( Attaching -> UpToDate )
[13108.249903] block drbd2: attached to UUIDs 4ED02AF2177212EC:0000000000000000:892841F769F80FD2:0000000000000004
[13108.280862] block drbd1002: disk( Diskless -> Attaching )
[13108.284504] block drbd1002: max BIO size = 1048576
[13108.284517] block drbd1002: drbd_bm_resize called with capacity == 52450480
[13108.284898] block drbd1002: resync bitmap: bits=6556310 words=102443 pages=201
[13108.284905] block drbd1002: size = 25 GB (26225240 KB)
[13108.310601] block drbd1002: bitmap READ of 201 pages took 6 jiffies
[13108.311534] block drbd1002: recounting of set bits took additional 0 jiffies
[13108.311538] block drbd1002: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[13108.311549] block drbd1002: disk( Attaching -> UpToDate )
[13108.311554] block drbd1002: attached to UUIDs 6162EF30A167BF53:3874D93F4CBE04F5:CFFD994FAA06C840:0000000000000004
[13108.346176] block drbd3: disk( Diskless -> Attaching )
[13108.346903] block drbd3: max BIO size = 1048576
[13108.346914] block drbd3: drbd_bm_resize called with capacity == 52450480
[13108.347352] block drbd3: resync bitmap: bits=6556310 words=102443 pages=201
[13108.347360] block drbd3: size = 25 GB (26225240 KB)
[13108.393579] block drbd3: bitmap READ of 201 pages took 12 jiffies
[13108.394521] block drbd3: recounting of set bits took additional 0 jiffies
[13108.394526] block drbd3: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[13108.394537] block drbd3: disk( Attaching -> UpToDate )
[13108.394542] block drbd3: attached to UUIDs F424BEA340DA597E:0000000000000000:7534D6E7D2916062:0000000000000004
[13108.415953] d-con r0: conn( StandAlone -> Unconnected )
[13108.415992] d-con r0: Starting receiver thread (from drbd_w_r0 [27299])
[13108.416137] d-con r0: receiver (re)started
[13108.416169] d-con r0: conn( Unconnected -> WFConnection )
[13108.429798] block drbd1003: disk( Diskless -> Attaching )
[13108.430229] block drbd1003: max BIO size = 1048576
[13108.430239] block drbd1003: drbd_bm_resize called with capacity == 52450480
[13108.430690] block drbd1003: resync bitmap: bits=6556310 words=102443 pages=201
[13108.430700] block drbd1003: size = 25 GB (26225240 KB)
[13108.470077] block drbd1003: bitmap READ of 201 pages took 10 jiffies
[13108.471604] block drbd1003: recounting of set bits took additional 0 jiffies
[13108.471611] block drbd1003: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[13108.471625] block drbd1003: disk( Attaching -> UpToDate )
[13108.471631] block drbd1003: attached to UUIDs 8FA72E3EFCD7F989:24C98CC08449A9C5:0A1A0D1A2CCE437D:0000000000000004
[13108.515667] d-con r1: conn( StandAlone -> Unconnected )
[13108.515759] d-con r1: Starting receiver thread (from drbd_w_r1 [27304])
[13108.520305] d-con r1: receiver (re)started
[13108.520345] d-con r1: conn( Unconnected -> WFConnection )
[13108.896962] block drbd1000: role( Secondary -> Primary )
[13108.909627] block drbd0: role( Secondary -> Primary )
[13108.919828] block drbd0: new current UUID 9456ACA94896E7CB:5D13C510E7C43556:CC7A830506E5F9FA:0000000000000004
[13108.928460] block drbd1001: role( Secondary -> Primary )
[13108.942434] block drbd1: role( Secondary -> Primary )
[13108.953932] block drbd1: new current UUID 61301C50D5932F3B:3DD43846444979D0:C2E4C547C3069E48:0000000000000004
[13108.962680] block drbd1002: role( Secondary -> Primary )
[13108.979068] block drbd2: role( Secondary -> Primary )
[13108.994196] block drbd2: new current UUID 7A60A90478413765:4ED02AF2177212EC:892841F769F80FD2:0000000000000004
[13109.002821] block drbd1003: role( Secondary -> Primary )
[13109.016954] block drbd3: role( Secondary -> Primary )
[13109.030957] block drbd3: new current UUID E8E6056FE9F372BB:F424BEA340DA597E:7534D6E7D2916062:0000000000000004

However, I still have no idea what caused the failures.

Ideas? Suggestions?

Eric Pretorious
Truckee, CA

________________________________
From: Eric <epretorious at yahoo.com<mailto:epretorious at yahoo.com>>
To: Dan Barker <dbarker at visioncomm.net<mailto:dbarker at visioncomm.net>>; "drbd-user at lists.linbit.com<mailto:drbd-user at lists.linbit.com>" <drbd-user at lists.linbit.com<mailto:drbd-user at lists.linbit.com>>
Sent: Monday, January 21, 2013 1:48 PM
Subject: Re: [DRBD-user] Diagnosing a Failed Resource

Dan:

I attempted to re-connect r1 and then captured the results in the kernel ring buffer:

[ 7038.868118] d-con r1: conn( StandAlone -> Unconnected )
[ 7038.868170] d-con r1: Starting receiver thread (from drbd_w_r1 [5058])
[ 7038.868238] d-con r1: receiver (re)started
[ 7038.868269] d-con r1: conn( Unconnected -> WFConnection )
[ 7039.367612] d-con r1: Handshake successful: Agreed network protocol version 100
[ 7039.367818] d-con r1: conn( WFConnection -> WFReportParams )
[ 7039.367851] d-con r1: Starting asender thread (from drbd_r_r1 [20387])
[ 7039.376119] block drbd1000: drbd_sync_handshake:
[ 7039.376127] block drbd1000: self E624A5F197121810:701073FA8F926E0E:B8DFF1CE13CF5A28:B8DEF1CE13CF5A29 bits:0 flags:0
[ 7039.376133] block drbd1000: peer 221AECCC2A594D3B:701073FA8F926E0E:B8DFF1CE13CF5A29:B8DEF1CE13CF5A29 bits:0 flags:0
[ 7039.376139] block drbd1000: uuid_compare()=100 by rule 90
[ 7039.376146] block drbd1000: helper command: /sbin/drbdadm initial-split-brain minor-1000
[ 7039.378519] block drbd1000: helper command: /sbin/drbdadm initial-split-brain minor-1000 exit code 0 (0x0)
[ 7039.378544] block drbd1000: Split-Brain detected but unresolved, dropping connection!
[ 7039.378551] block drbd1000: helper command: /sbin/drbdadm split-brain minor-1000
[ 7039.381167] block drbd1000: helper command: /sbin/drbdadm split-brain minor-1000 exit code 0 (0x0)
[ 7039.381228] d-con r1: conn( WFReportParams -> Disconnecting )
[ 7039.381237] d-con r1: error receiving ReportState, e: -5 l: 0!
[ 7039.381274] d-con r1: asender terminated
[ 7039.381283] d-con r1: Terminating asender thread
[ 7039.381539] d-con r1: Connection closed
[ 7039.381567] d-con r1: conn( Disconnecting -> StandAlone )
[ 7039.381573] d-con r1: receiver terminated
[ 7039.381577] d-con r1: Terminating receiver thread
I have no idea what it means, however.

Eric Pretorious
Truckee, CA

________________________________
From: Dan Barker <dbarker at visioncomm.net<mailto:dbarker at visioncomm.net>>
To: "drbd-user at lists.linbit.com<mailto:drbd-user at lists.linbit.com>" <drbd-user at lists.linbit.com<mailto:drbd-user at lists.linbit.com>>
Sent: Monday, January 21, 2013 6:40 AM
Subject: Re: [DRBD-user] Diagnosing a Failed Resource

The errors in connecting are logged. If you can’t find them, attempt to connect a resource (drbdadm connect r1, for example) to create the errors again, and then look at the logs for the reason the connection was not established. The “status” will continue to show waiting for connection (WFC) but there will be a reason in the log files. If the logs are unclear, post the relevant portions back here and we’ll help.

Something like ‘dmesg | grep drbd’. You may want to do the logs on both drbd servers. You can do the connect command on either.

hth

Dan

From: drbd-user-bounces at lists.linbit.com<mailto:drbd-user-bounces at lists.linbit.com> [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Eric
Sent: Monday, January 21, 2013 1:24 AM
To: drbd-user at lists.linbit.com<mailto:drbd-user at lists.linbit.com>
Subject: [DRBD-user] Diagnosing a Failed Resource

I've configured corosync+pacemaker to managee a simple two-resource DRBD cluster:

> san1:~ # crm configure show | cat -
> node san1 \
>     attributes standby="off"
> node san2 \
>     attributes standby="off"
> primitive p_DRBD-r0 ocf:linbit:drbd \
>     params drbd_resource="r0" \
>     op monitor interval="60s"
> primitive p_DRBD-r1 ocf:linbit:drbd \
>     params drbd_resource="r1" \
>     op monitor interval="60s"
> primitive p_IP-1_253 ocf:heartbeat:IPaddr2 \
>     params ip="192.168.1.253" cidr_netmask="24" \
>     op monitor interval="30s"
> primitive p_IP-1_254 ocf:heartbeat:IPaddr2 \
>     params ip="192.168.1.254" cidr_netmask="24" \
>     op monitor interval="30s"
> primitive p_iSCSI-san1 ocf:heartbeat:iSCSITarget \
>     params iqn="iqn.2012-11.com<http://iqn.2012-11.com.example.sa/>.example.san1:sda" \
>     op monitor interval="10s"
> primitive p_iSCSI-san1_0 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com<http://iqn.2012-11.com.example.sa/>.example.san1:sda" lun="0" path="/dev/drbd0" \
>     op monitor interval="10s"
> primitive p_iSCSI-san1_1 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="1" path="/dev/drbd1" \
>     op monitor interval="10s"
> primitive p_iSCSI-san1_2 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="2" path="/dev/drbd2" \
>     op monitor interval="10s"
> primitive p_iSCSI-san1_3 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="3" path="/dev/drbd3" \
>     op monitor interval="10s"
> primitive p_iSCSI-san2 ocf:heartbeat:iSCSITarget \
>     params iqn="iqn.2012-11.com.example.san2:sda" \
>     op monitor interval="10s"
> primitive p_iSCSI-san2_0 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="0" path="/dev/drbd1000" \
>     op monitor interval="10s"
> primitive p_iSCSI-san2_1 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="1" path="/dev/drbd1001" \
>     op monitor interval="10s"
> primitive p_iSCSI-san2_2 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="2" path="/dev/drbd1002" \
>     op monitor interval="10s"
> primitive p_iSCSI-san2_3 ocf:heartbeat:iSCSILogicalUnit \
>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="3" path="/dev/drbd1003" \
>     op monitor interval="10s"
> group g_iSCSI-san1 p_iSCSI-san1 p_iSCSI-san1_0 p_iSCSI-san1_1 p_iSCSI-san1_2 p_iSCSI-san1_3 p_IP-1_254
> group g_iSCSI-san2 p_iSCSI-san2 p_iSCSI-san2_0 p_iSCSI-san2_1 p_iSCSI-san2_2 p_iSCSI-san2_3 p_IP-1_253
> ms ms_DRBD-r0 p_DRBD-r0 \
>     meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
> ms ms_DRBD-r1 p_DRBD-r1 \
>     meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
> location l_iSCSI-san1_and_DRBD-r0 p_IP-1_254 10240: san1
> location l_iSCSI-san2_and_DRBD-r1 p_IP-1_253 10240: san2
> colocation c_iSCSI_with_DRBD-r0 inf: g_iSCSI-san1 ms_DRBD-r0:Master
> colocation c_iSCSI_with_DRBD-r1 inf: g_iSCSI-san2 ms_DRBD-r1:Master
> order o_DRBD-r0_before_iSCSI-san1 inf: ms_DRBD-r0:promote g_iSCSI-san1:start
> order o_DRBD-r1_before_iSCSI-san2 inf: ms_DRBD-r1:promote g_iSCSI-san2:start
> property $id="cib-bootstrap-options" \
>     dc-version="1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf" \
>     cluster-infrastructure="openais" \
>     expected-quorum-votes="2" \
>     stonith-enabled="false" \
>     no-quorum-policy="ignore"

The cluster appears to be functioning correctly:

> san1:~ # crm_mon -1
> ============
> Last updated: Sun Jan 20 22:20:17 2013
> Last change: Sun Jan 20 21:59:15 2013 by root via crm_attribute on san1
> Stack: openais
> Current DC: san1 - partition with quorum
> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
> 2 Nodes configured, 2 expected votes
> 16 Resources configured.
> ============
>
> Online: [ san1 san2 ]
>
>  Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>      Masters: [ san1 ]
>      Slaves: [ san2 ]
>  Resource Group: g_iSCSI-san1
>      p_iSCSI-san1    (ocf::heartbeat:iSCSITarget):    Started san1
>      p_iSCSI-san1_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_IP-1_254    (ocf::heartbeat:IPaddr2):    Started san1
>  Master/Slave Set: ms_DRBD-r1 [p_DRBD-r1]
>      Masters: [ san2 ]
>      Slaves: [ san1 ]
>  Resource Group: g_iSCSI-san2
>      p_iSCSI-san2    (ocf::heartbeat:iSCSITarget):    Started san2
>      p_iSCSI-san2_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_IP-1_253    (ocf::heartbeat:IPaddr2):    Started san2

> san2:~ # crm_mon -1
> ============
> Last updated: Sun Jan 20 22:20:17 2013
> Last change: Sun Jan 20 21:59:15 2013 by root via crm_attribute on san1
> Stack: openais
> Current DC: san1 - partition with quorum
> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
> 2 Nodes configured, 2 expected votes
> 16 Resources configured.
> ============
>
> Online: [ san1 san2 ]
>
>  Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>      Masters: [ san1 ]
>      Slaves: [ san2 ]
>  Resource Group: g_iSCSI-san1
>      p_iSCSI-san1    (ocf::heartbeat:iSCSITarget):    Started san1
>      p_iSCSI-san1_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_iSCSI-san1_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>      p_IP-1_254    (ocf::heartbeat:IPaddr2):    Started san1
>  Master/Slave Set: ms_DRBD-r1 [p_DRBD-r1]
>      Masters: [ san2 ]
>      Slaves: [ san1 ]
>  Resource Group: g_iSCSI-san2
>      p_iSCSI-san2    (ocf::heartbeat:iSCSITarget):    Started san2
>      p_iSCSI-san2_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san2_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_IP-1_253    (ocf::heartbeat:IPaddr2):    Started san2
However, the two DRBD resources do not appear to be communicating:

> san1:~ # cat /proc/drbd
> version: 8.4.1 (api:1/proto:86-100)
> GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at fat-tyre, 2011-12-20 12:43:15
>  0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:3259080
>  1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>  2: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>  3: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>
> 1000: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1001: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1002: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1003: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

> san2:~ # cat /proc/drbd
> version: 8.4.1 (api:1/proto:86-100)
> GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at fat-tyre, 2011-12-20 12:43:15
>  0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:140
>  1: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>  2: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>  3: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>
> 1000: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1001: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1002: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1003: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
How can I begin to troubleshoot this error?

Eric Pretorious
Truckee, cA

_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com<mailto:drbd-user at lists.linbit.com>
http://lists.linbit.com/mailman/listinfo/drbd-user

_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com<mailto:drbd-user at lists.linbit.com>
http://lists.linbit.com/mailman/listinfo/drbd-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130122/27cd67ab/attachment.htm>