[DRBD-user] recovering from "Local IO failed. Detaching..."

Gianluca Cecchi gianluca.cecchi at gmail.com
Wed Sep 16 16:20:12 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


With drbd git installed on peer and rebooting it, while maintaining the
source as 8.3.3rc2 it succeeds in synchronization now.

I have
[root at virtfedbis ~]# cat /proc/drbd
version: 8.3.3rc2 (api:88/proto:86-91)
GIT-hash: 0acb7c07a61225ba880fde2a32b8f5f8fa49c8cc build by root at virtfedbis,
2009-09-16 16:01:09
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----
    ns:0 nr:1234556 dw:1234556 dr:56 al:0 bm:345 lo:0 pe:0 ua:0 ap:0 ep:1
wo:d oos:0

[root at virtfed ~]# cat /proc/drbd
version: 8.3.3rc2 (api:88/proto:86-91)
GIT-hash: 04b2f175d7076ef2e0dd7d5ba6f6843357a041ed build by root at virtfedbis,
2009-09-11 10:06:20
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----
    ns:2277548 nr:0 dw:22925304 dr:5310672 al:404 bm:794 lo:0 pe:0 ua:0 ap:0
ep:1 wo:b oos:0

I'm going to update virtfed too.....

Messages in virtfed
Sep 16 16:08:04 virtfed kernel: block drbd0: Handshake successful: Agreed
network protocol version 91
Sep 16 16:08:04 virtfed kernel: block drbd0: Peer authenticated using 20
bytes of 'sha1' HMAC
Sep 16 16:08:04 virtfed kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Sep 16 16:08:04 virtfed kernel: block drbd0: Starting asender thread (from
drbd0_receiver [2450])
Sep 16 16:08:04 virtfed kernel: block drbd0: data-integrity-alg: <not-used>
Sep 16 16:08:04 virtfed kernel: block drbd0: drbd_sync_handshake:
Sep 16 16:08:04 virtfed kernel: block drbd0: self
81BAD3F384A6F3C7:7FC155C9F5183159:13247E4B98A2B256:71E806A9BE572C29
bits:308176 flags:0
Sep 16 16:08:04 virtfed kernel: block drbd0: peer
7FC155C9F5183158:0000000000000000:9EB0CCB7634CBCDC:A0332E51B243BEE1
bits:235520 flags:2
Sep 16 16:08:04 virtfed kernel: block drbd0: uuid_compare()=1 by rule 70
Sep 16 16:08:04 virtfed kernel: block drbd0: Becoming sync source due to
disk states.
Sep 16 16:08:04 virtfed kernel: block drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> Inconsistent )
Sep 16 16:08:04 virtfed kernel: block drbd0: peer( Secondary -> Primary )
Sep 16 16:08:04 virtfed kernel: block drbd0: conn( WFBitMapS -> SyncSource )

Sep 16 16:08:04 virtfed kernel: block drbd0: Began resync as SyncSource
(will sync 1232704 KB [308176 bits set]).
Sep 16 16:08:22 virtfed kernel: block drbd0: Resync done (total 18 sec;
paused 0 sec; 68480 K/sec)
Sep 16 16:08:22 virtfed kernel: block drbd0: conn( SyncSource -> Connected )
pdsk( Inconsistent -> UpToDate )

dmesg on virtfedbis:
drbd: initialized. Version: 8.3.3rc2 (api:88/proto:86-91)
drbd: GIT-hash: 0acb7c07a61225ba880fde2a32b8f5f8fa49c8cc build by
root at virtfedbis, 2009-09-16 16:01:09
drbd: registered as block device major 147
drbd: minor_table @ 0xffff880824d16800
block drbd0: Starting worker thread (from cqueue [1903])
block drbd0: disk( Diskless -> Attaching )
block drbd0: Found 6 transactions (244 active extents) in activity log.
block drbd0: Method to ensure write ordering: barrier
block drbd0: max_segment_size ( = BIO size ) = 32768
block drbd0: drbd_bm_resize called with capacity == 109317376
block drbd0: resync bitmap: bits=13664672 words=213511
block drbd0: size = 52 GB (54658688 KB)
block drbd0: recounting of set bits took additional 2 jiffies
block drbd0: 920 MB (235520 bits) marked out-of-sync by on disk bit-map.
block drbd0: Marked additional 0 KB as out-of-sync based on AL.
end_request: I/O error, dev cciss/c0d0, sector 0
block drbd0: meta data flush failed with status -95, disabling md-flushes
block drbd0: disk( Attaching -> Inconsistent )
block drbd0: conn( StandAlone -> Unconnected )
block drbd0: Starting receiver thread (from drbd0_worker [1911])
block drbd0: receiver (re)started
block drbd0: conn( Unconnected -> WFConnection )
block drbd0: Handshake successful: Agreed network protocol version 91
block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
block drbd0: conn( WFConnection -> WFReportParams )
block drbd0: Starting asender thread (from drbd0_receiver [1931])
block drbd0: data-integrity-alg: <not-used>
block drbd0: drbd_sync_handshake:
block drbd0: self
7FC155C9F5183158:0000000000000000:9EB0CCB7634CBCDC:A0332E51B243BEE1
bits:235520 flags:0
block drbd0: peer
81BAD3F384A6F3C7:7FC155C9F5183159:13247E4B98A2B256:71E806A9BE572C29
bits:308176 flags:0
block drbd0: uuid_compare()=-1 by rule 50
block drbd0: Becoming sync target due to disk states.
block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT )
pdsk( DUnknown -> UpToDate )
block drbd0: role( Secondary -> Primary )
block drbd0: conn( WFBitMapT -> WFSyncUUID )
block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit
code 0 (0x0)
block drbd0: conn( WFSyncUUID -> SyncTarget )
block drbd0: Began resync as SyncTarget (will sync 1232704 KB [308176 bits
set]).
block drbd0: write: error=-95 s=39967568s
block drbd0: Method to ensure write ordering: flush
end_request: I/O error, dev cciss/c0d0, sector 0
block drbd0: local disk flush failed with status -95
block drbd0: Method to ensure write ordering: drain
block drbd0: Resync done (total 18 sec; paused 0 sec; 68480 K/sec)
block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate
)
block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0
block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit
code 0 (0x0)

Do the messages regarding " local disk flush failed with status -95" suggest
to anyway applying all the changes:
no-disk-barrier;
no-disk-flushes;
no-md-flushes;

drawbacks about these?
Let's go and see if it is stable now....

Gianluca

On Wed, Sep 16, 2009 at 2:20 PM, Lars Ellenberg
<lars.ellenberg at linbit.com>wrote:

> On Tue, Sep 15, 2009 at 03:35:46PM +0200, Gianluca Cecchi wrote:
> > > please try current git, if you can.
> > > http://git.drbd.org/?p=drbd-8.3.git;a=summary
> > > there has been one regression in this area
> > > somewhere between 8.3.2 and 8.3.3rc1,
> > > which now is fixed again.
> > >
> >
> > I would like but I'm behind a proxy.
> > I tried some configurations for proxy, searching how to use git through a
> > proxy, but I can  for example get git for wine, but not for drbd.
> > Do you serve your git repository through http too?
> > Attempting this:
> > git clone http://git.drbd.org/drbd-8.3
>
> git clone http://git.drbd.org/drbd-8.3.git
>
> might work. a bit many git in there, I know,
> but we are dealing with redundancy anyways, after all.
>
> though the git:// protocol is faster and prefered.
>
> in case said regression should be the reason for your trouble, of course
> you could also go back to 8.3.2 (which does not contain that
> regression), or wait for 8.3.3 final.
>
> or, as I suggested earlier, add "no-disk-barrier; no-disk-flushes;
> no-md-flushes;" to your disk {} section, which would be a valid
> work-around for said regression.
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090916/0ffdf6c8/attachment.htm>


More information about the drbd-user mailing list