[DRBD-user] recovering from "Local IO failed. Detaching..."

Gianluca Cecchi gianluca.cecchi at gmail.com
Fri Sep 11 10:59:13 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Ok,
it seemed to me that the simpler, more advisable and more useful thing for
testing drbd too was to update drbd to 8.3.3rc2, keeping the same the drbd
conf and the 2.6.30 kernel (just recently updated to 2.6.30 stream in f11 as
I wrote before...)
And that WAS the right approach (at least at this time).

After starting the Primary/UpToDate node, it was in this state
...
Starting DRBD resources: [ d(r0) s(r0) n(r0) ]...
[root at virtfed x86_64]# service drbd status
drbd driver loaded OK; device status:
version: 8.3.3rc2 (api:88/proto:86-91)
GIT-hash: 04b2f175d7076ef2e0dd7d5ba6f6843357a041ed build by root at virtfedbis,
2009-09-11 10:06:20
m:res  cs            ro               ds                 p  mounted  fstype
0:r0   WFConnection  Primary/Unknown  UpToDate/Outdated  C

Doing a "service drbd start" on the peer, I get this on the other one
Sep 11 10:44:03 virtfed kernel: block drbd0: Handshake successful: Agreed
network protocol version 91
Sep 11 10:44:03 virtfed kernel: block drbd0: Peer authenticated using 20
bytes of 'sha1' HMAC
Sep 11 10:44:03 virtfed kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Sep 11 10:44:03 virtfed kernel: block drbd0: Starting asender thread (from
drbd0_receiver [11115])
Sep 11 10:44:03 virtfed kernel: block drbd0: data-integrity-alg: <not-used>
Sep 11 10:44:03 virtfed kernel: block drbd0: drbd_sync_handshake:
Sep 11 10:44:03 virtfed kernel: block drbd0: self
A0332E51B243BEE1:7C12A37C6FB9B1CB:DB97F5F6C5FBB26C:FAFACA8496A4ED9D
bits:79098 flags:0
Sep 11 10:44:03 virtfed kernel: block drbd0: peer
7C12A37C6FB9B1CA:0000000000000000:0DB564243F5AA9A3:377245292BBD1112
bits:235520 flags:2
Sep 11 10:44:03 virtfed kernel: block drbd0: uuid_compare()=1 by rule 70
Sep 11 10:44:03 virtfed kernel: block drbd0: Becoming sync source due to
disk states.
Sep 11 10:44:03 virtfed kernel: block drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> Inconsistent )
Sep 11 10:44:03 virtfed kernel: block drbd0: peer( Secondary -> Primary )
Sep 11 10:44:03 virtfed kernel: block drbd0: conn( WFBitMapS -> SyncSource )

Sep 11 10:44:03 virtfed kernel: block drbd0: Began resync as SyncSource
(will sync 1258472 KB [314618 bits set]).
Sep 11 10:44:23 virtfed kernel: block drbd0: Resync done (total 19 sec;
paused 0 sec; 66232 K/sec)
Sep 11 10:44:23 virtfed kernel: block drbd0: conn( SyncSource -> Connected )
pdsk( Inconsistent -> UpToDate )

No messages on peer node because I started it in single user mode and
manually started the network and sshd daemon... and then drbd, so the
messages file was not populated, but dmesg gives same information I think:

drbd: initialized. Version: 8.3.3rc2 (api:88/proto:86-91)
drbd: GIT-hash: 04b2f175d7076ef2e0dd7d5ba6f6843357a041ed build by
root at virtfedbis, 2009-09-11 10:06:20
drbd: registered as block device major 147
drbd: minor_table @ 0xffff880826f09b00
block drbd0: Starting worker thread (from cqueue [1833])
block drbd0: disk( Diskless -> Attaching )
block drbd0: Found 6 transactions (244 active extents) in activity log.
block drbd0: Method to ensure write ordering: barrier
block drbd0: max_segment_size ( = BIO size ) = 32768
block drbd0: drbd_bm_resize called with capacity == 109317376
block drbd0: resync bitmap: bits=13664672 words=213511
block drbd0: size = 52 GB (54658688 KB)
block drbd0: recounting of set bits took additional 1 jiffies
block drbd0: 920 MB (235520 bits) marked out-of-sync by on disk bit-map.
block drbd0: Marked additional 0 KB as out-of-sync based on AL.
end_request: I/O error, dev cciss/c0d0, sector 0
block drbd0: meta data flush failed with status -95, disabling md-flushes
block drbd0: disk( Attaching -> Inconsistent )
block drbd0: conn( StandAlone -> Unconnected )
block drbd0: Starting receiver thread (from drbd0_worker [1835])
block drbd0: receiver (re)started
block drbd0: conn( Unconnected -> WFConnection )
block drbd0: Handshake successful: Agreed network protocol version 91
block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
block drbd0: conn( WFConnection -> WFReportParams )
block drbd0: Starting asender thread (from drbd0_receiver [1861])
block drbd0: data-integrity-alg: <not-used>
block drbd0: drbd_sync_handshake:
block drbd0: self
7C12A37C6FB9B1CA:0000000000000000:0DB564243F5AA9A3:377245292BBD1112
bits:235520 flags:0
block drbd0: peer
A0332E51B243BEE1:7C12A37C6FB9B1CB:DB97F5F6C5FBB26C:FAFACA8496A4ED9D
bits:79098 flags:0
block drbd0: uuid_compare()=-1 by rule 50
block drbd0: Becoming sync target due to disk states.
block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT )
pdsk( DUnknown -> UpToDate )
block drbd0: role( Secondary -> Primary )
block drbd0: conn( WFBitMapT -> WFSyncUUID )
block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit
code 0 (0x0)
block drbd0: conn( WFSyncUUID -> SyncTarget )
block drbd0: Began resync as SyncTarget (will sync 1258472 KB [314618 bits
set]).
block drbd0: Resync done (total 19 sec; paused 0 sec; 66232 K/sec)
block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate
)
block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0
block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit
code 0 (0x0)

Now the situation is correctly at:

 [root at virtfedbis ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.3.3rc2 (api:88/proto:86-91)
GIT-hash: 04b2f175d7076ef2e0dd7d5ba6f6843357a041ed build by root at virtfedbis,
2009-09-11 10:06:20
m:res  cs         ro               ds                 p  mounted  fstype
0:r0   Connected  Primary/Primary  UpToDate/UpToDate  C

During the sync phase (some seconds):
[root at virtfedbis ~]# cat /proc/drbd
version: 8.3.3rc2 (api:88/proto:86-91)
GIT-hash: 04b2f175d7076ef2e0dd7d5ba6f6843357a041ed build by root at virtfedbis,
2009-09-11 10:06:20
 0: cs:SyncTarget ro:Primary/Primary ds:Inconsistent/UpToDate C r----
    ns:0 nr:1186920 dw:1186824 dr:56 al:0 bm:383 lo:4 pe:2236 ua:3 ap:0 ep:1
wo:b oos:71648
    [=================>..] sync'ed: 94.5% (71648/1258472)K
    finish: 0:00:01 speed: 64,736 (65,932) K/sec

Notice that I rebooted both the nodes so the network interfaces, during the
start of the peer and the sync was in original state:

 [root at virtfedbis ~]# ethtool -k eth3
Offload parameters for eth3:
Cannot get device flags: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off

Thanks for the answers and support!
Gianluca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090911/fb74201d/attachment.htm>


More information about the drbd-user mailing list