[DRBD-user] Diagnosing a Failed Resource

Eric epretorious at yahoo.com
Mon Jan 21 21:03:23 CET 2013


Regarding the TCP sockets (or lack thereof): I found this in the ring buffer after putting SAN2 into standby mode (and causing the cluster to promote r1 [to primary] on SAN1):

san1:~ # dmesg | grep '\[ 53'
[ 5329.349289] block drbd1000: role( Secondary -> Primary ) 
[ 5329.360252] block drbd1001: role( Secondary -> Primary ) 
[ 5329.379071] block drbd1002: role( Secondary -> Primary ) 
[ 5329.426989] block drbd1003: role( Secondary -> Primary ) 
[ 5343.319014] d-con r1: conn( StandAlone -> Unconnected ) 
[ 5343.319047] d-con r1: Starting receiver thread (from drbd_w_r1 [5058])
[ 5343.319589] d-con r1: receiver (re)started
[ 5343.319629] d-con r1: conn( Unconnected -> WFConnection ) 
[ 5345.785122] d-con r1: Handshake successful: Agreed network protocol version 100
[ 5345.785281] d-con r1: conn( WFConnection -> WFReportParams ) 
[ 5345.785444] d-con r1: Starting asender thread (from drbd_r_r1 [1002])
[ 5345.808072] block drbd1000: drbd_sync_handshake:
[ 5345.808081] block drbd1000: self E624A5F197121811:701073FA8F926E0E:B8DFF1CE13CF5A28:B8DEF1CE13CF5A29 bits:0 flags:0
[ 5345.808088] block drbd1000: peer 221AECCC2A594D3A:701073FA8F926E0E:B8DFF1CE13CF5A29:B8DEF1CE13CF5A29 bits:0 flags:0
[ 5345.808095] block drbd1000: uuid_compare()=100 by rule 90
[ 5345.808103] block drbd1000: helper command: /sbin/drbdadm initial-split-brain minor-1000
[ 5345.810730] block drbd1000: helper command: /sbin/drbdadm initial-split-brain minor-1000 exit code 0 (0x0)
[ 5345.810752] block drbd1000: Split-Brain detected but unresolved, dropping connection!
[ 5345.810761] block drbd1000: helper command: /sbin/drbdadm split-brain minor-1000
[ 5345.813274] block drbd1000: helper command: /sbin/drbdadm split-brain minor-1000 exit code 0 (0x0)
[ 5345.813330] d-con r1: conn( WFReportParams -> Disconnecting ) 
[ 5345.813337] d-con r1: error receiving ReportState, e: -5 l: 0!
[ 5345.813357] d-con r1: asender terminated
[ 5345.813364] d-con r1: Terminating asender thread
[ 5345.815601] d-con r1: Connection closed
[ 5345.816401] d-con r1: conn( Disconnecting -> StandAlone ) 
[ 5345.816409] d-con r1: receiver terminated
[ 5345.816413] d-con r1: Terminating receiver thread
[ 5345.825972] iscsi_trgt: iscsi_volume_del(319) 2 3
[ 5345.878586] iscsi_trgt: iscsi_volume_del(319) 2 2
[ 5345.929437] iscsi_trgt: iscsi_volume_del(319) 2 1
[ 5345.980288] iscsi_trgt: iscsi_volume_del(319) 2 0
[ 5345.996109] d-con r0: Handshake successful: Agreed network protocol version 100
[ 5345.996227] d-con r0: conn( WFConnection -> WFReportParams ) 
[ 5345.996233] d-con r0: Starting asender thread (from drbd_r_r0 [5099])
[ 5346.008801] d-con r0: meta connection shut down by peer.
[ 5346.008844] d-con r0: conn( WFReportParams -> NetworkFailure ) 
[ 5346.008849] d-con r0: asender terminated
[ 5346.008854] d-con r0: Terminating asender thread
[ 5346.020959] d-con r0: Connection closed
[ 5346.021320] d-con r0: conn( NetworkFailure -> Unconnected ) 
[ 5346.021326] d-con r0: receiver terminated
[ 5346.021329] d-con r0: Restarting receiver thread
[ 5346.021333] d-con r0: receiver (re)started
[ 5346.021362] d-con r0: conn( Unconnected -> WFConnection ) 
[ 5346.099104] block drbd1000: role( Primary -> Secondary ) 
[ 5346.099149] block drbd1000: bitmap WRITE of 0 pages took 0 jiffies
[ 5346.099332] block drbd1000: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[ 5346.118758] block drbd1001: role( Primary -> Secondary ) 
[ 5346.118814] block drbd1001: bitmap WRITE of 0 pages took 0 jiffies
[ 5346.119313] block drbd1001: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[ 5346.136762] block drbd1002: role( Primary -> Secondary ) 
[ 5346.136810] block drbd1002: bitmap WRITE of 0 pages took 0 jiffies
[ 5346.137007] block drbd1002: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[ 5346.150667] block drbd1003: role( Primary -> Secondary ) 
[ 5346.150715] block drbd1003: bitmap WRITE of 0 pages took 0 jiffies
[ 5346.150910] block drbd1003: 0 KB (0 bits) marked out-of-sync by on disk bit-map.

The entries in /var/log/messages are very similar but quite a bit more verbose (~450 lines with cluster messages).


HTH,
Eric Pretorious
Truckee, CA



>________________________________
> From: Eric <epretorious at yahoo.com>
>To: "drbd-user at lists.linbit.com" <drbd-user at lists.linbit.com> 
>Sent: Monday, January 21, 2013 10:32 AM
>Subject: Re: [DRBD-user] Diagnosing a Failed Resource
> 
>
>Thanks, Dan:
>
>This is what I found in the kernel ring buffer after rebooting both nodes:
>
>[   75.630608] events: mcg drbd: 3
>[   75.636697] drbd: initialized. Version: 8.4.1 (api:1/proto:86-100)
>[   75.636701] drbd: GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at fat-tyre, 2011-12-20 12:43:15
>[   75.636705] drbd: registered as block device major 147
>[   77.232358] d-con r0: Starting worker thread (from drbdsetup [5455])
>[   77.233253] block drbd0: disk( Diskless -> Attaching ) 
>[   77.233722] d-con r0: Method to ensure write ordering: flush
>[   77.233731] block drbd0: max BIO size = 1048576
>[   77.233742] block drbd0: drbd_bm_resize called with capacity ==
 104868976
>[   77.234800] block drbd0: resync bitmap: bits=13108622 words=204823 pages=401
>[   77.234813] block drbd0: size = 50 GB (52434488 KB)
>[   77.281241] block drbd0: bitmap READ of 401 pages took 12 jiffies
>[   77.283203] block drbd0: recounting of set bits took additional 0 jiffies
>[   77.283209] block drbd0: 140 KB (35 bits) marked out-of-sync by on disk bit-map.
>[   77.283219] block drbd0: disk( Attaching -> UpToDate ) 
>[   77.283224] block drbd0: attached to UUIDs 44A65229313EBE43:BA033E902BDEA3C0:49524878FFCE4B24:49514878FFCE4B25
>[   77.293056] d-con r1: Starting worker thread (from drbdsetup [5458])
>[   77.295342] block drbd1000: disk( Diskless -> Attaching ) 
>[   77.296192] d-con r1: Method to ensure write ordering: flush
>[   77.296208] block drbd1000: max BIO size = 1048576
>[  
 77.296222] block drbd1000: drbd_bm_resize called with capacity == 104868976
>[   77.297379] block drbd1000: resync bitmap: bits=13108622 words=204823 pages=401
>[   77.297390] block drbd1000: size = 50 GB (52434488 KB)
>[   77.342459] block drbd1000: bitmap READ of 401 pages took 11 jiffies
>[   77.344485] block drbd1000: recounting of set bits took additional 1 jiffies
>[   77.344491] block drbd1000: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
>[   77.344502] block drbd1000: disk( Attaching -> UpToDate ) 
>[   77.344507] block drbd1000: attached to UUIDs 221AECCC2A594D3B:701073FA8F926E0E:B8DFF1CE13CF5A29:B8DEF1CE13CF5A29
>[   77.375446] block drbd1: disk( Diskless -> Attaching ) 
>[   77.380421] block drbd1: max BIO size = 1048576
>[   77.380429] block drbd1: drbd_bm_resize called with capacity ==
 104868976
>[   77.381243] block drbd1: resync bitmap: bits=13108622 words=204823 pages=401
>[   77.381251] block drbd1: size = 50 GB (52434488 KB)
>[   77.419582] block drbd1: bitmap READ of 401 pages took 9 jiffies
>[   77.421605] block drbd1: recounting of set bits took additional 1 jiffies
>[   77.421611] block drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
>[   77.421623] block drbd1: disk( Attaching -> UpToDate ) 
>[   77.421628] block drbd1: attached to UUIDs 67FE22C03838CB17:7971B70BBB1F530C:2A4C483084B9C378:2A4B483084B9C379
>[   77.444507] block drbd1001: disk( Diskless -> Attaching ) 
>[   77.450684] block drbd1001: max BIO size = 1048576
>[   77.450693] block drbd1001: drbd_bm_resize called with capacity == 104868976
>[   77.451506] block drbd1001: resync bitmap: bits=13108622 words=204823
 pages=401
>[   77.451514] block drbd1001: size = 50 GB (52434488 KB)
>[   77.488711] block drbd1001: bitmap READ of 401 pages took 10 jiffies
>[   77.491721] block drbd1001: recounting of set bits took additional 0 jiffies
>[   77.491729] block drbd1001: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
>[   77.491743] block drbd1001: disk( Attaching -> UpToDate ) 
>[   77.491750] block drbd1001: attached to UUIDs FA58888B061508B7:18C803DDDD8F8404:07D53A5226AD90C1:07D43A5226AD90C1
>[   77.534563] block drbd2: disk( Diskless -> Attaching ) 
>[   77.546629] block drbd2: max BIO size = 1048576
>[   77.546642] block drbd2: drbd_bm_resize called with capacity == 52450480
>[   77.547215] block drbd2: resync bitmap: bits=6556310 words=102443 pages=201
>[   77.547224] block drbd2: size = 25 GB (26225240
 KB)
>[   77.573960] block drbd2: bitmap READ of 201 pages took 7 jiffies
>[   77.575557] block drbd2: recounting of set bits took additional 0 jiffies
>[   77.575566] block drbd2: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
>[   77.575580] block drbd2: disk( Attaching -> UpToDate ) 
>[   77.575588] block drbd2: attached to UUIDs 3B55297D5DD31139:3AC323693865E310:AEFA3EE056B21A62:AEF93EE056B21A63
>[   77.585082] block drbd1002: disk( Diskless -> Attaching ) 
>[   77.593864] block drbd1002: max BIO size = 1048576
>[   77.593877] block drbd1002: drbd_bm_resize called with capacity == 52450480
>[   77.594313] block drbd1002: resync bitmap: bits=6556310 words=102443 pages=201
>[   77.594323] block drbd1002: size = 25 GB (26225240 KB)
>[   77.625687] block drbd1002: bitmap READ of 201 pages took 8
 jiffies
>[   77.626621] block drbd1002: recounting of set bits took additional 0 jiffies
>[   77.626626] block drbd1002: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
>[   77.626638] block drbd1002: disk( Attaching -> UpToDate ) 
>[   77.626643] block drbd1002: attached to UUIDs CA60CC73DC0B45A5:70A50B0066EC9F38:4F0256C54722763F:4F0156C54722763F
>[   77.659026] block drbd3: disk( Diskless -> Attaching ) 
>[   77.664455] block drbd3: max BIO size = 1048576
>[   77.664464] block drbd3: drbd_bm_resize called with capacity == 52450480
>[   77.664808] block drbd3: resync bitmap: bits=6556310 words=102443 pages=201
>[   77.664813] block drbd3: size = 25 GB (26225240 KB)
>[   77.691727] block drbd3: bitmap READ of 201 pages took 6 jiffies
>[   77.692806] block drbd3: recounting of set bits took additional 1
 jiffies
>[   77.692812] block drbd3: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
>[   77.692823] block drbd3: disk( Attaching -> UpToDate ) 
>[   77.692828] block drbd3: attached to UUIDs 29A7176ED176F95F:2698161DF7C4C4E6:F668581CC2EC0F3E:F667581CC2EC0F3F
>[   77.722193] block drbd1003: disk( Diskless -> Attaching ) 
>[   77.722865] block drbd1003: max BIO size = 1048576
>[   77.722876] block drbd1003: drbd_bm_resize called with capacity == 52450480
>[   77.723392] block drbd1003: resync bitmap: bits=6556310 words=102443 pages=201
>[   77.723402] block drbd1003: size = 25 GB (26225240 KB)
>[   77.765961] block drbd1003: bitmap READ of 201 pages took 11 jiffies
>[   77.766888] block drbd1003: recounting of set bits took additional 0 jiffies
>[   77.766893] block drbd1003: 0 KB (0 bits) marked out-of-sync by on
 disk bit-map.
>[   77.766905] block drbd1003: disk( Attaching -> UpToDate ) 
>[   77.766910] block drbd1003: attached to UUIDs 18DFC2D8939D73FB:177CEE9681F7C93A:D417E74E9E527B07:D416E74E9E527B07
>[   77.799304] d-con r0: conn( StandAlone -> Unconnected ) 
>[   77.799341] d-con r0: Starting receiver thread (from drbd_w_r0 [5459])
>[   77.800720] d-con r1: conn( StandAlone -> Unconnected ) 
>[   77.800778] d-con r1: Starting receiver thread (from drbd_w_r1 [5460])
>[   77.802795] d-con r0: receiver (re)started
>[   77.802828] d-con r0: conn( Unconnected -> WFConnection ) 
>[   77.804862] d-con r1: receiver (re)started
>[   77.804901] d-con r1: conn( Unconnected -> WFConnection ) 
>[   78.161839] block drbd1000: role( Secondary -> Primary ) 
>[   78.180162] block drbd1001: role( Secondary ->
 Primary ) 
>[   78.195287] block drbd1002: role( Secondary -> Primary ) 
>[   78.233303] block drbd1003: role( Secondary -> Primary ) 
>[   78.303485] d-con r0: Handshake successful: Agreed network protocol version 100
>[   78.303632] d-con r0: conn( WFConnection -> WFReportParams ) 
>[   78.303639] d-con r0: Starting asender thread (from drbd_r_r0 [5519])
>[   78.303825] d-con r1: Handshake successful: Agreed network protocol version 100
>[   78.303957] d-con r1: conn( WFConnection -> WFReportParams ) 
>[   78.303964] d-con r1: Starting asender thread (from drbd_r_r1 [5520])
>[   78.344087] block drbd0: drbd_sync_handshake:
>[   78.344096] block drbd0: self 44A65229313EBE42:BA033E902BDEA3C0:49524878FFCE4B24:49514878FFCE4B25 bits:35 flags:0
>[   78.344103] block drbd0: peer
 4B28E699CC1DEE8D:BA033E902BDEA3C0:49524878FFCE4B25:49514878FFCE4B25 bits:814770 flags:2
>[   78.344111] block drbd0: uuid_compare()=100 by rule 90
>[   78.344120] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0
>[   78.344168] block drbd1000: drbd_sync_handshake:
>[   78.344174] block drbd1000: self 221AECCC2A594D3B:701073FA8F926E0E:B8DFF1CE13CF5A29:B8DEF1CE13CF5A29 bits:0 flags:0
>[   78.344181] block drbd1000: peer E624A5F197121810:701073FA8F926E0E:B8DFF1CE13CF5A28:B8DEF1CE13CF5A29 bits:0 flags:2
>[   78.344186] block drbd1000: uuid_compare()=100 by rule 90
>[   78.344192] block drbd1000: helper command: /sbin/drbdadm initial-split-brain minor-1000
>[   78.348069] d-con r1: meta connection shut down by peer.
>[   78.348105] d-con r1: conn( WFReportParams -> NetworkFailure ) 
>[   78.348109] d-con r1: asender
 terminated
>[   78.348113] d-con r1: Terminating asender thread
>[   78.354965] block drbd1000: helper command: /sbin/drbdadm initial-split-brain minor-1000 exit code 0 (0x0)
>[   78.355004] block drbd1000: Split-Brain detected but unresolved, dropping connection!
>[   78.355014] block drbd1000: helper command: /sbin/drbdadm split-brain minor-1000
>[   78.357534] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
>[   78.357565] block drbd0: Split-Brain detected but unresolved, dropping connection!
>[   78.357574] block drbd0: helper command: /sbin/drbdadm split-brain minor-0
>[   78.363276] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
>[   78.363330] d-con r0: conn( WFReportParams -> Disconnecting ) 
>[   78.363337] d-con r0: error receiving ReportState, e:
 -5 l: 0!
>[   78.363804] block drbd1000: helper command: /sbin/drbdadm split-brain minor-1000 exit code 0 (0x0)
>[   78.363814] d-con r0: asender terminated
>[   78.363823] d-con r0: Terminating asender thread
>[   78.363855] d-con r1: conn( NetworkFailure -> Disconnecting ) 
>[   78.363862] d-con r1: error receiving ReportState, e: -5 l: 0!
>[   78.364193] d-con r0: Connection closed
>[   78.364217] d-con r0: conn( Disconnecting -> StandAlone ) 
>[   78.364222] d-con r0: receiver terminated
>[   78.364225] d-con r0: Terminating receiver thread
>[   78.364420] d-con r1: Connection closed
>[   78.365199] d-con r1: conn( Disconnecting -> StandAlone ) 
>[   78.365207] d-con r1: receiver terminated
>[   78.365215] d-con r1: Terminating receiver thread
>
>
>The entry "Split-Brain detected but unresolved, dropping connection!" caught my attention but I'm not sure how to address the split-brain situation if there's no connection between the two nodes. (I checked both nodes' DRBD processes for open TCP sockets and it appears that, at different times, the sockets are in different states. i.e., Sometimes the DRBD process on SAN1 has no sockets open, and other times it has one or both sockets open.) e.g.,
>
>
>> san1:~ # netstat -tan | grep 77
>> tcp        0      0 192.168.1.1:7789        0.0.0.0:*               LISTEN
>
>
>
>Thoughts? FWIW: There is no data on the resources/volumes so I'm not concerned about preserving the resource/volumes *but* I would like to treat this situation as if there were data so that I can resolve these kinds of errors in the future.
>
>
>
>Eric Pretorious
>Truckee, cA
>
>
>
>
>>________________________________
>> From: Dan Barker <dbarker at visioncomm.net>
>>To: "drbd-user at lists.linbit.com" <drbd-user at lists.linbit.com> 
>>Sent: Monday, January 21, 2013 6:40 AM
>>Subject: Re: [DRBD-user] Diagnosing a Failed Resource
>> 
>>
>> 
>>The errors in connecting are logged. If you can’t find them, attempt to connect a resource (drbdadm connect r1, for example) to create the errors again, and then look at the logs for the reason the connection was not established. The “status” will continue to show waiting for connection (WFC) but there will be a reason in the log files. If the logs are unclear, post the relevant portions back here and we’ll help.
>> 
>>Something like ‘dmesg | grep drbd’. You may want to do the logs on both drbd servers. You can do the connect command on either.
>> 
>>hth
>> 
>>Dan
>> 
>>From:drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Eric
>>Sent: Monday, January 21, 2013 1:24 AM
>>To: drbd-user at lists.linbit.com
>>Subject: [DRBD-user] Diagnosing a Failed Resource
>> 
>>I've configured corosync+pacemaker to managee a simple two-resource DRBD cluster:
>> 
>>> san1:~ # crm configure show | cat -
>>> node san1 \
>>>     attributes standby="off"
>>> node san2 \
>>>     attributes standby="off"
>>> primitive p_DRBD-r0 ocf:linbit:drbd \
>>>     params drbd_resource="r0" \
>>>     op monitor interval="60s"
>>> primitive p_DRBD-r1 ocf:linbit:drbd \
>>>     params drbd_resource="r1" \
>>>     op monitor interval="60s"
>>> primitive p_IP-1_253 ocf:heartbeat:IPaddr2 \
>>>     params ip="192.168.1.253" cidr_netmask="24" \
>>>     op monitor interval="30s"
>>> primitive p_IP-1_254 ocf:heartbeat:IPaddr2 \
>>>     params ip="192.168.1.254" cidr_netmask="24" \
>>>     op monitor interval="30s"
>>> primitive p_iSCSI-san1 ocf:heartbeat:iSCSITarget \
>>>     params iqn="iqn.2012-11.com.example.san1:sda" \
>>>     op monitor interval="10s"
>>> primitive p_iSCSI-san1_0 ocf:heartbeat:iSCSILogicalUnit \
>>>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="0" path="/dev/drbd0" \
>>>     op monitor interval="10s"
>>> primitive p_iSCSI-san1_1 ocf:heartbeat:iSCSILogicalUnit \
>>>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="1" path="/dev/drbd1" \
>>>     op monitor interval="10s"
>>> primitive p_iSCSI-san1_2 ocf:heartbeat:iSCSILogicalUnit \
>>>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="2" path="/dev/drbd2" \
>>>     op monitor interval="10s"
>>> primitive p_iSCSI-san1_3 ocf:heartbeat:iSCSILogicalUnit \
>>>     params target_iqn="iqn.2012-11.com.example.san1:sda" lun="3" path="/dev/drbd3" \
>>>     op monitor interval="10s"
>>> primitive p_iSCSI-san2 ocf:heartbeat:iSCSITarget \
>>>     params iqn="iqn.2012-11.com.example.san2:sda" \
>>>     op monitor interval="10s"
>>> primitive p_iSCSI-san2_0 ocf:heartbeat:iSCSILogicalUnit \
>>>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="0" path="/dev/drbd1000" \
>>>     op monitor interval="10s"
>>> primitive p_iSCSI-san2_1 ocf:heartbeat:iSCSILogicalUnit \
>>>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="1" path="/dev/drbd1001" \
>>>     op monitor interval="10s"
>>> primitive p_iSCSI-san2_2 ocf:heartbeat:iSCSILogicalUnit \
>>>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="2" path="/dev/drbd1002" \
>>>     op monitor interval="10s"
>>> primitive p_iSCSI-san2_3 ocf:heartbeat:iSCSILogicalUnit \
>>>     params target_iqn="iqn.2012-11.com.example.san2:sda" lun="3" path="/dev/drbd1003" \
>>>     op monitor interval="10s"
>>> group g_iSCSI-san1 p_iSCSI-san1 p_iSCSI-san1_0 p_iSCSI-san1_1 p_iSCSI-san1_2 p_iSCSI-san1_3 p_IP-1_254
>>> group g_iSCSI-san2 p_iSCSI-san2 p_iSCSI-san2_0 p_iSCSI-san2_1 p_iSCSI-san2_2 p_iSCSI-san2_3 p_IP-1_253
>>> ms ms_DRBD-r0 p_DRBD-r0 \
>>>     meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
>>> ms ms_DRBD-r1 p_DRBD-r1 \
>>>     meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
>>> location l_iSCSI-san1_and_DRBD-r0 p_IP-1_254 10240: san1
>>> location l_iSCSI-san2_and_DRBD-r1 p_IP-1_253 10240: san2
>>> colocation c_iSCSI_with_DRBD-r0 inf: g_iSCSI-san1 ms_DRBD-r0:Master
>>> colocation c_iSCSI_with_DRBD-r1 inf: g_iSCSI-san2 ms_DRBD-r1:Master
>>> order o_DRBD-r0_before_iSCSI-san1 inf: ms_DRBD-r0:promote g_iSCSI-san1:start
>>> order o_DRBD-r1_before_iSCSI-san2 inf: ms_DRBD-r1:promote g_iSCSI-san2:start
>>> property $id="cib-bootstrap-options" \
>>>     dc-version="1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf" \
>>>     cluster-infrastructure="openais" \
>>>     expected-quorum-votes="2" \
>>>     stonith-enabled="false" \
>>>     no-quorum-policy="ignore"
>> 
>>The cluster appears to be functioning correctly:
>> 
>>> san1:~ # crm_mon -1
>>> ============
>>> Last updated: Sun Jan 20 22:20:17 2013
>>> Last change: Sun Jan 20 21:59:15 2013 by root via crm_attribute on san1
>>> Stack: openais
>>> Current DC: san1 - partition with quorum
>>> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
>>> 2 Nodes configured, 2 expected votes
>>> 16 Resources configured.
>>> ============
>>> 
>>> Online: [ san1 san2 ]
>>> 
>>>  Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>>>      Masters: [ san1 ]
>>>      Slaves: [ san2 ]
>>>  Resource Group: g_iSCSI-san1
>>>      p_iSCSI-san1    (ocf::heartbeat:iSCSITarget):    Started san1
>>>      p_iSCSI-san1_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>>>      p_iSCSI-san1_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>>>      p_iSCSI-san1_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>>>      p_iSCSI-san1_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>>>      p_IP-1_254    (ocf::heartbeat:IPaddr2):    Started san1
>>>  Master/Slave Set: ms_DRBD-r1 [p_DRBD-r1]
>>>      Masters: [ san2 ]
>>>      Slaves: [ san1 ]
>>>  Resource Group: g_iSCSI-san2
>>>      p_iSCSI-san2    (ocf::heartbeat:iSCSITarget):    Started san2
>>>      p_iSCSI-san2_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>>      p_iSCSI-san2_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>>      p_iSCSI-san2_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>>      p_iSCSI-san2_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>>      p_IP-1_253    (ocf::heartbeat:IPaddr2):    Started san2
>>
>>> san2:~ # crm_mon -1
>>> ============
>>> Last updated: Sun Jan 20 22:20:17 2013
>>> Last change: Sun Jan 20 21:59:15 2013 by root via crm_attribute on san1
>>> Stack: openais
>>> Current DC: san1 - partition with quorum
>>> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
>>> 2 Nodes configured, 2 expected votes
>>> 16 Resources configured.
>>> ============
>>> 
>>> Online: [ san1 san2 ]
>>> 
>>>  Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>>>      Masters: [ san1 ]
>>>      Slaves: [ san2 ]
>>>  Resource Group: g_iSCSI-san1
>>>      p_iSCSI-san1    (ocf::heartbeat:iSCSITarget):    Started san1
>>>      p_iSCSI-san1_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>>>      p_iSCSI-san1_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>>>      p_iSCSI-san1_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>>>      p_iSCSI-san1_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san1
>>>      p_IP-1_254    (ocf::heartbeat:IPaddr2):    Started san1
>>>  Master/Slave Set: ms_DRBD-r1 [p_DRBD-r1]
>>>      Masters: [ san2 ]
>>>      Slaves: [ san1 ]
>>>  Resource Group: g_iSCSI-san2
>>>      p_iSCSI-san2    (ocf::heartbeat:iSCSITarget):    Started san2
>>>      p_iSCSI-san2_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>>      p_iSCSI-san2_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>>      p_iSCSI-san2_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>>      p_iSCSI-san2_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>>      p_IP-1_253    (ocf::heartbeat:IPaddr2):    Started san2
>>However, the two DRBD resources do not appear to be communicating:
>> 
>>> san1:~ # cat /proc/drbd 
>>> version: 8.4.1 (api:1/proto:86-100)
>>> GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at fat-tyre, 2011-12-20 12:43:15
>>>  0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:3259080
>>>  1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>>>  2: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>>>  3: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>>> 
>>> 1000: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>>>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>>> 1001: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>>>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>>> 1002: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>>>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>>> 1003: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
>>>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>> 
>>> san2:~ # cat /proc/drbd 
>>> version: 8.4.1 (api:1/proto:86-100)
>>> GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at fat-tyre, 2011-12-20 12:43:15
>>>  0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:140
>>>  1: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>>>  2: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>>>  3: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>>> 
>>> 1000: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>>> 1001: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>>> 1002: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>>> 1003: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>>>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>>How can I begin to troubleshoot this error?
>> 
>>Eric Pretorious
>>Truckee, cA
>>_______________________________________________
>>drbd-user mailing list
>>drbd-user at lists.linbit.com
>>http://lists.linbit.com/mailman/listinfo/drbd-user
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130121/f4f21be9/attachment.htm>


More information about the drbd-user mailing list