[DRBD-user] IO Error Logging

Andrew Eross eross at locatrix.com
Sat Oct 6 03:38:00 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Thanks Dan - I missed that those messages were in dmesg.

So starting from 2 connected nodes (Secondary/Secondary) we set one to
primary ("drbdadm primary drbd-sr1") and from my perspective my ssh
connection drops and the machine locks up for about 5 minutes.

Same behavior on both nodes, either one freezes for 5 minutes when being
set to the primary.. so it doesn't appear to be any kind of hardware issue
specific to one of them.

Below is what I'm seeing in dmesg.

Note - the two nodes in question are connected by a cross-over gigabit
cable.

Very weird behavior.. after 5 minutes of freezing up the node came up again
and everything seems to be ok..

Anyone have any ideas?

block drbd1: role( Secondary -> Primary )
d-con drbd-sr1: asender terminated
d-con drbd-sr1: Terminating asender thread
d-con drbd-sr1: Connection closed
block drbd1: new current UUID
5A99C51D68CDB447:188E44BA42FFFCF4:2460EA01C7EA7F96:245FEA01C7EA7F96
d-con drbd-sr1: conn( BrokenPipe -> Unconnected )
d-con drbd-sr1: receiver terminated
d-con drbd-sr1: Restarting receiver thread
d-con drbd-sr1: receiver (re)started
d-con drbd-sr1: conn( Unconnected -> WFConnection )
d-con drbd-sr1: initial packet S crossed
d-con drbd-sr1: Handshake successful: Agreed network protocol version 101
d-con drbd-sr1: conn( WFConnection -> WFReportParams )
d-con drbd-sr1: Starting asender thread (from drbd_r_drbd-sr1 [26469])
block drbd1: drbd_sync_handshake:
block drbd1: self
5A99C51D68CDB447:188E44BA42FFFCF4:2460EA01C7EA7F96:245FEA01C7EA7F96 bits:0
flags:0
block drbd1: peer
188E44BA42FFFCF4:0000000000000000:2460EA01C7EA7F96:245FEA01C7EA7F96 bits:0
flags:0
block drbd1: uuid_compare()=1 by rule 70
block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS
) pdsk( DUnknown -> Consistent )
block drbd1: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1),
total 23; compression: 100.0%
block drbd1: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1),
total 23; compression: 100.0%
block drbd1: helper command: /sbin/drbdadm before-resync-source minor-1
block drbd1: helper command: /sbin/drbdadm before-resync-source minor-1
exit code 0 (0x0)
block drbd1: conn( WFBitMapS -> SyncSource ) pdsk( Consistent ->
Inconsistent )
block drbd1: Began resync as SyncSource (will sync 0 KB [0 bits set]).
block drbd1: updated sync UUID
5A99C51D68CDB447:188F44BA42FFFCF4:188E44BA42FFFCF4:2460EA01C7EA7F96
block drbd1: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
block drbd1: updated UUIDs
5A99C51D68CDB447:0000000000000000:188F44BA42FFFCF4:188E44BA42FFFCF4
block drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate
)

On Fri, Oct 5, 2012 at 6:39 PM, Dan Barker <dbarker at visioncomm.net> wrote:

> dmesg | grep sr1 should show you all you need to know.****
>
> ** **
>
> Dan (there’s that word “should” again<g>)****
>
> ** **
>
> *From:* drbd-user-bounces at lists.linbit.com [mailto:
> drbd-user-bounces at lists.linbit.com] *On Behalf Of *Andrew Eross
> *Sent:* Friday, October 05, 2012 2:17 PM
> *To:* drbd-user at lists.linbit.com
> *Subject:* [DRBD-user] IO Error Logging****
>
> ** **
>
> Hi guys,****
>
> ** **
>
> I'm trying to debug a SSD drive that's the backing device for my secondary
> node.****
>
> ** **
>
> The primary/secondary are sync'd (protocol C) and everything goes fine
> until I get to testing fail-over, e.g.on the primary "drbdadm secondary
> drbd-sr1", and on the secondary "drbdadm primary drbd-sr1".****
>
> ** **
>
> When I do this the secondary locks up for about 5 minutes (SSH session
> drops) then it starts responding again and I see drbd has now dropped into
> diskless mode.****
>
> ** **
>
> I'm thinking there might be IO errors occurring with the underlying disk
> and perhaps drbd is automatically detaching it.****
>
> ** **
>
> Right now I'm running badblocks on the backing device and seeing if it can
> find any problems.****
>
> ** **
>
> In the meantime I've been trying to figure out how to get more information
> about IO errors from drbd.****
>
> ** **
>
> My devices are configured with "detach" as recommended (
> http://www.drbd.org/users-guide/s-configure-io-error-behavior.html),
> however, I'm not sure how to find out more information about when this
> event occurs.****
>
> ** **
>
> Are there any debugging options I can enable that would help me see IO
> error details that caused a detach? ****
>
> ** **
>
> Thanks!****
>
> Andrew****
>
> ** **
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20121005/e007628b/attachment.htm>


More information about the drbd-user mailing list