[DRBD-user] Failure to resync - resulted in data loss

Tue Sep 25 04:14:30 CEST 2007

Quoting Lars Ellenberg <lars.ellenberg at linbit.com>:

> On Mon, Sep 24, 2007 at 11:43:13AM +1000, Alex Strachan wrote:
> > I have experienced a failure for DRBD to reconnect after a system
> > forced power off/on due to a hung host.
> >
> > DRBD 0.7.21
> > Hearbeat 2.1.2
> >
> > DRBD log output
> >
> >   Sep 20 17:58:06 sinfids3a1 drbd: Stopping all DRBD resources
> >   Sep 20 17:58:06 sinfids3a1 drbd: ioctl(,SET_STATE,) failed:
> >
> > The host - on it's way down.
> > At this point the filesystem was still active as Heartbeat didn't stop
> successfully.
> >
> > Is there a way for drbd to handle this?
> >
> >   Sep 20 17:58:06 sinfids3a1 drbd: Device or resource busy
> >   Sep 20 17:58:06 sinfids3a1 drbd: Someone has opened the device for RW
> access!
> >   Sep 20 17:58:06 sinfids3a1 drbd: Command '/sbin/drbdsetup /dev/drbd0
> down' terminated with exit code 20
> >   Sep 20 17:58:06 sinfids3a1 drbd: drbdsetup exited with code 20
> >   Sep 20 17:58:06 sinfids3a1 drbd: ERROR: Module drbd is in use
> >   Sep 20 17:58:06 sinfids3a1 drbd: .
> >   Sep 20 17:58:06 sinfids3a1 rc: Stopping drbd:  succeeded
> >
> > The host - on it's way up.
> >
> >   Sep 21 10:11:49 sinfids3a1 drbd: [
> >   Sep 21 10:11:49 sinfids3a1 drbd: d0
> >   Sep 21 10:11:49 sinfids3a1 kernel: drbd0: Adjusting my ra_pages to
> backing device's (32 -> 1024)
> >   Sep 21 10:11:49 sinfids3a1 kernel: klogd 1.4.1, ---------- state change
> ----------
> >   Sep 21 10:11:49 sinfids3a1 kernel: drbd0: resync bitmap: bits=43350764
> words=677356
> >   Sep 21 10:11:49 sinfids3a1 kernel: drbd0: size = 165 GB (173403056 KB)
> >   Sep 21 10:11:49 sinfids3a1 kernel: mptbase: ioc0: IOCStatus(0x0002): Busy
> >   Sep 21 10:11:49 sinfids3a1 kernel: mptbase: ioc1: IOCStatus(0x0002): Busy
> >   Sep 21 10:11:49 sinfids3a1 kernel: mptbase: ioc0: IOCStatus(0x0002): Busy
> >   Sep 21 10:11:49 sinfids3a1 kernel: mptbase: ioc1: IOCStatus(0x0002): Busy
> >   Sep 21 10:11:51 sinfids3a1 kernel: drbd0: 180 MB marked out-of-sync by on
> disk bit-map.
> >   Sep 21 10:11:51 sinfids3a1 kernel: drbd0: Found 6 transactions (324
> active extents) in activity log.
> >   Sep 21 10:11:51 sinfids3a1 kernel: drbd0: Marked additional 983 MB as
> out-of-sync based on AL.
> >   Sep 21 10:11:51 sinfids3a1 kernel: drbd0: drbdsetup [5217]: cstate
> Unconfigured --> StandAlone
> >   Sep 21 10:11:51 sinfids3a1 drbd: s0
> >   Sep 21 10:11:51 sinfids3a1 drbd: n0
> >   Sep 21 10:11:51 sinfids3a1 kernel: drbd0: drbdsetup [5318]: cstate
> StandAlone --> Unconnected
> >   Sep 21 10:11:51 sinfids3a1 kernel: drbd0: drbd0_receiver [5319]: cstate
> Unconnected --> WFConnection
> >   Sep 21 10:11:51 sinfids3a1 drbd: ]
> >   Sep 21 10:11:51 sinfids3a1 drbd: .
> >   Sep 21 10:11:51 sinfids3a1 drbd: WARN: stdin/stdout is not a TTY; using
> /dev/console
> >   Sep 21 10:11:51 sinfids3a1 kernel: drbd0: using degr_wfc_timeout=10
> seconds
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: drbd0_receiver [5319]: cstate
> WFConnection --> WFReportParams
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: Handshake successful: DRBD
> Network Protocol version 74
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: Connection established.
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: I am(S):
> 1:00000002:00000001:000000c5:00000033:10
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: Peer(P):
> 1:00000002:00000001:000000c3:00000034:10
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: drbd0_receiver [5319]: cstate
> WFReportParams --> WFBitMapS
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: sock_sendmsg returned -32
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: drbd0_receiver [5319]: cstate
> WFBitMapS --> BrokenPipe
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: short sent ReportBitMap
> size=4096 sent=0
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: Secondary/Unknown -->
> Secondary/Primary
> >
> > It has detected it's partner but then breaks the connection.
> > This has resulted in serious data loss.
>
> no. that _prevented_ data loss. see [*] below.
>
> > Is there a way to prevent this?
>
> you monitor your raid for degraded mode.
> then please monitor drbd connection status as well, and fix it in time.
>
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: meta connection shut down by
> peer.
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: asender terminated
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: sock was shut down by peer
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: drbd0_receiver [5319]: cstate
> BrokenPipe --> BrokenPipe
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: short read expecting header on
> sock: r=0
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: worker terminated
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: drbd0_receiver [5319]: cstate
> BrokenPipe --> Unconnected
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: Connection lost.
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: drbd0_receiver [5319]: cstate
> Unconnected --> WFConnection
> >   Sep 21 10:12:01 sinfids3a1 rc: Starting drbd:  succeeded
> >
> > The scenario:
> >
> > sinfids3a1 running as master
> > host sinfids3a1 hung
> > reboot sinfids3a1
>
>  aparently here or
>
> > sinfids3a2 running as master
> > updated filesystem (nagios config)
> > powered off/on sinfids3a1
>
>   there, drbd did not re-establish the connection,
>   reason given below.
>   so it did no longer replicate,
>   sinfids3a1 humming along just nursing its stale data.
>
> > [.....]
> > failed system over to sinfids3a1
>
> and there you decided to go online with that stale data.
>
> > updates to the filesystem were LOST !!!
>
> right. you asked for it.
>
> [*]
> because of the way the generation counters work in drbd 0.7,
> after the reboot of sinfids3a1, it would have liked to become sync source.
> luckily sinfids3a2 detected that this was nonsense,
> and refused to become sync target, otherwise you'd have synced the stale
> data to the live node right here and now.
> that's why I say it _prevented_ data loss.
>
> in drbd 8, you'd get the infamous "split brain detected" message.
>
> this all only means that there has been a time where both nodes
> have modified their data independently.
>
> sinfids3a1 during the time during its shutdown
> where it still was primary but already closed the network.
>
> sinfids3a2 during the time when it took over.
>
> >   Sep 21 10:10:57 sinfids3a2 kernel: bcm5700: eth1 NIC Link is Down
> >   Sep 21 10:11:00 sinfids3a2 kernel: bcm5700: eth1 NIC Link is Up, 1000
> Mbps full duplex, receive & transmit flow control ON
> >   Sep 21 10:11:53 sinfids3a2 kernel: drbd0: drbd0_receiver [5318]: cstate
> WFConnection --> WFReportParams
> >   Sep 21 10:11:53 sinfids3a2 kernel: drbd0: Handshake successful: DRBD
> Network Protocol version 74
> >   Sep 21 10:11:53 sinfids3a2 kernel: drbd0: Connection established.
> >   Sep 21 10:11:53 sinfids3a2 kernel: drbd0: I am(P):
> 1:00000002:00000001:000000c3:00000034:10
> >   Sep 21 10:11:53 sinfids3a2 kernel: drbd0: Peer(S):
> 1:00000002:00000001:000000c5:00000033:10
> >   Sep 21 10:11:53 sinfids3a2 kernel: drbd0: Current Primary shall become
> sync TARGET! Aborting to prevent data corruption.
>
> right there, you should have been alerted to the degraded

I will add in some alert checks but this will not stop HB from starting the DRBD
resource against a secondary/unknown consistent filesystem.

> (not connected, not replicating) state of drbd,
> and should have done "drbdadm connect" as aprorpiate
> to reestablish connections.
>

I am using HB2 with drbddisk; when Hb ran 'drbddisk start' then this was
successful and the system came up using the stale filesystem.

I didn't check the status of DRBD after the reboot of sinfids3a1.  Later on
I instructed HB to failover by putting the current node to stand-by.

In summary:
It is possible to have  (after a failed host reboots with either software or
hardware error)

sinfids3a1
  DRBD secondary/unknown   consistent
  HB host online status
  HB 'drbddisk status' is Okay

sinfids3a2
  DRBD primary/unknown     consistent
  HB host online status
  HB 'drbddisk status' is Okay

If host sinfids3a2 is put to HB Standby status then HB will mount the
stale filesystem of sinfids3a1.

When sinfids3a1 initially connected with sinfids3a2 it did
 >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: Secondary/Unknown -->
> Secondary/Primary
> >   Sep 21 10:11:53 sinfids3a1 kernel: drbd0: meta connection shut down by
> peer.

Would you have expected the status of DRBD on sinfids3a1 to have changed to
Secondary/unkown   inconsistent  ?
This would have prevented HB from mounting the stale filesystem.

You indicated that maybe the network was stopped before drbd; these are the
relevant SK scripts - is the order okay ?

[root at sinfids3b1 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 3)

[root at sinfids3b1 ~]# find /etc -type l -name "*drbd*" | sort
/etc/rc.d/rc0.d/K08drbd
/etc/rc.d/rc1.d/K08drbd
/etc/rc.d/rc2.d/K08drbd
/etc/rc.d/rc3.d/S99drbd
/etc/rc.d/rc4.d/S99drbd
/etc/rc.d/rc5.d/S99drbd
/etc/rc.d/rc6.d/K08drbd

[root at sinfids3b1 ~]# find /etc -type l -name "*network*" | sort
/etc/rc.d/rc0.d/K90network
/etc/rc.d/rc1.d/K90network
/etc/rc.d/rc2.d/S10network
/etc/rc.d/rc3.d/S10network
/etc/rc.d/rc4.d/S10network
/etc/rc.d/rc5.d/S10network
/etc/rc.d/rc6.d/K90network

[root at sinfids3b1 ~]# find /etc -type l -name "*heartbeat*" | sort
/etc/rc.d/rc0.d/K01heartbeat
/etc/rc.d/rc1.d/K01heartbeat
/etc/rc.d/rc2.d/S99heartbeat
/etc/rc.d/rc3.d/S99heartbeat
/etc/rc.d/rc4.d/S99heartbeat
/etc/rc.d/rc5.d/S99heartbeat
/etc/rc.d/rc6.d/K01heartbeat

My concern is now that there is a possibility of
  sinfids3a1  (now Primary) fails due a software/hardware fault and is hardware
watchdogged
  sinfids3a2  (is now primary)
  sinfids3a1 has finished rebooting, HB is happy but DRBD has not reconnected.
  sinfids3a2  (now Primary) fails due a software/hardware fault and is hardware
watchdogged
  sinfids3a1  (is now primary) but using a stale filesystem.

How to prevent this from happening?    Is there a way to modify the drbddisk
script so that it can detect that the partner is there but there is a problem
with connectivity - to the fact that the partner is not there.  For the first
scenario when HB issues 'drbddisk start' then it should fail.

> --
> : Lars Ellenberg                           http://www.linbit.com :
> : DRBD/HA support and consulting             sales at linbit.com :
> : LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
> : Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
> __
> please use the "List-Reply" function of your email client.
>