[DRBD-user] error re-syncing after raid failure

Desjardins, Kristian kdesjard at NRCan.gc.ca
Thu Oct 11 13:45:49 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I lost disk in a RAID5, however, the raid controller or driver (megasas)
did not handle it properly, so it required a reboot.  After the reboot
drbd failed to re-synchronize.  After doing a "drbdadm --
--discard-my-data connect all", I noticed that it starts to sync and
then fails again.  Am I going to have to do a full re-sync?

Thanks. 


The relevant log messages:

BAD HOST:

Oct 11 07:26:52 host2 kernel: drbd0: disk( Inconsistent -> Diskless )
Oct 11 07:26:52 host2 kernel: drbd0: drbd_bm_resize called with capacity
== 0
Oct 11 07:26:52 host2 kernel: drbd0: worker terminated
Oct 11 07:26:53 host2 kernel: drbd0: disk( Diskless -> Attaching )
Oct 11 07:26:53 host2 kernel: drbd0: Found 6 transactions (324 active
extents) in activity log.
Oct 11 07:26:53 host2 kernel: drbd0: max_segment_size ( = BIO size ) =
32768
Oct 11 07:26:53 host2 kernel: drbd0: drbd_bm_resize called with capacity
== 19032384136
Oct 11 07:26:53 host2 kernel: drbd0: resync bitmap: bits=2379048017
words=37172626
Oct 11 07:26:53 host2 kernel: drbd0: size = 9075 GB (9516192068 KB)
Oct 11 07:26:57 host2 kernel: drbd0: reading of bitmap took 3422 jiffies
Oct 11 07:26:57 host2 kernel: drbd0: recounting of set bits took
additional 205 jiffies
Oct 11 07:26:57 host2 kernel: drbd0: 1536 MB marked out-of-sync by on
disk bit-map.
Oct 11 07:26:57 host2 kernel: drbd0: disk( Attaching -> Inconsistent )
Oct 11 07:26:57 host2 kernel: drbd0: Writing meta data super block now.
Oct 11 07:26:57 host2 kernel: drbd0: conn( StandAlone -> Unconnected )
Oct 11 07:26:57 host2 kernel: drbd0: receiver (re)started
Oct 11 07:26:57 host2 kernel: drbd0: conn( Unconnected -> WFConnection )
Oct 11 07:26:57 host2 kernel: drbd0: conn( WFConnection ->
WFReportParams )
Oct 11 07:26:57 host2 kernel: drbd0: Handshake successful: DRBD Network
Protocol version 86
Oct 11 07:26:57 host2 kernel: drbd0: Peer authenticated using 20 bytes
of 'sha1' HMAC
Oct 11 07:26:57 host2 kernel: drbd0: Becoming sync target due to disk
states.
Oct 11 07:26:57 host2 kernel: drbd0: peer( Unknown -> Primary ) conn(
WFReportParams ->WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 11 07:26:57 host2 kernel: drbd0: Writing meta data super block now.
Oct 11 07:27:02 host2 kernel: drbd0: conn( WFBitMapT -> WFSyncUUID )
Oct 11 07:27:02 host2 kernel: drbd0: conn( WFSyncUUID -> SyncTarget )
Oct 11 07:27:02 host2 kernel: drbd0: Began resync as SyncTarget (will
sync 1572932 KB [393233 bits set]).
Oct 11 07:27:02 host2 kernel: drbd0: Writing meta data super block now.
Oct 11 07:27:02 host2 kernel: drbd0: BAD! sector=18811994048s enr=574096
rs_left=-4 rs_failed=0 count=8
Oct 11 07:27:02 host2 kernel:
Oct 11 07:27:02 host2 kernel: Call Trace:
Oct 11 07:27:02 host2 kernel: [<ffffffff88417ee5>]
:drbd:drbd_try_clear_on_disk_bm+0xb1/0x24d
Oct 11 07:27:02 host2 kernel: [<ffffffff88419f90>]
:drbd:__drbd_set_in_sync+0x205/0x2a9
Oct 11 07:27:02 host2 kernel: [<ffffffff88412ba8>]
:drbd:e_end_resync_block+0x5a/0xdc
Oct 11 07:27:02 host2 kernel: [<ffffffff884119fd>]
:drbd:drbd_process_done_ee+0xc8/0x11e
Oct 11 07:27:02 host2 kernel: [<ffffffff8841352f>]
:drbd:drbd_asender+0xc9/0x551
Oct 11 07:27:02 host2 kernel: [<ffffffff88420e40>]
:drbd:drbd_thread_setup+0x99/0xde
Oct 11 07:27:02 host2 kernel: [<ffffffff8005ed55>] child_rip+0xa/0x11
Oct 11 07:27:02 host2 kernel: [<ffffffff88420da7>]
:drbd:drbd_thread_setup+0x0/0xde
Oct 11 07:27:02 host2 kernel: [<ffffffff8005ed4b>] child_rip+0x0/0x11
Oct 11 07:27:02 host2 kernel:
Oct 11 07:27:02 host2 kernel: drbd0: peer( Primary -> Unknown ) conn(
SyncTarget -> Disconnecting ) pdsk( UpToDate -> DUnknown )
Oct 11 07:27:02 host2 kernel: drbd0: process_done_ee() = NOT_OK
Oct 11 07:27:02 host2 kernel: drbd0: asender terminated
Oct 11 07:27:02 host2 kernel: drbd0: short read expecting header on
sock: r=-512
Oct 11 07:27:02 host2 kernel: drbd0: tl_clear()
Oct 11 07:27:02 host2 kernel: drbd0: Connection closed
Oct 11 07:27:02 host2 kernel: drbd0: Writing meta data super block now.
Oct 11 07:27:02 host2 kernel: drbd0: conn( Disconnecting -> StandAlone )
Oct 11 07:27:02 host2 kernel: drbd0: receiver terminated


GOOD HOST:

Oct 11 07:37:12 host1 kernel: drbd0: conn( WFConnection ->
WFReportParams )
Oct 11 07:37:12 host1 kernel: drbd0: Handshake successful: DRBD Network
Protocol version 86
Oct 11 07:37:12 host1 kernel: drbd0: Peer authenticated using 20 bytes
of 'sha1' HMAC
Oct 11 07:37:12 host1 kernel: drbd0: Becoming sync source due to disk
states.
Oct 11 07:37:12 host1 kernel: drbd0: bm_set was 389156, corrected to
393233. drivers/block/drbd/drbd_receiver.c:2069
Oct 11 07:37:12 host1 kernel: drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> WFBitMapS )
Oct 11 07:37:15 host1 kernel: drbd0: Writing meta data super block now.
Oct 11 07:37:17 host1 kernel: drbd0: conn( WFBitMapS -> SyncSource )
Oct 11 07:37:17 host1 kernel: drbd0: Began resync as SyncSource (will
sync 1572932 KB [393233 bits set]).
Oct 11 07:37:17 host1 kernel: drbd0: Writing meta data super block now.
Oct 11 07:37:28 host1 kernel: drbd0: sock was shut down by peer
Oct 11 07:37:28 host1 kernel: drbd0: peer( Secondary -> Unknown ) conn(
SyncSource -> BrokenPipe )
Oct 11 07:37:28 host1 kernel: drbd0: short read expecting header on
sock: r=0
Oct 11 07:37:28 host1 kernel: drbd0: asender terminated
Oct 11 07:37:28 host1 kernel: drbd0: tl_clear()
Oct 11 07:37:28 host1 kernel: drbd0: Connection closed
Oct 11 07:37:28 host1 kernel: drbd0: Writing meta data super block now.
Oct 11 07:37:28 host1 kernel: drbd0: conn( BrokenPipe -> Unconnected )
Oct 11 07:37:28 host1 kernel: drbd0: receiver terminated
Oct 11 07:37:28 host1 kernel: drbd0: receiver (re)started
Oct 11 07:37:28 host1 kernel: drbd0: conn( Unconnected -> WFConnection )



More information about the drbd-user mailing list