[Drbd-dev] DRBD-8: failure to complete resync when connection lost and resyncing to primary

Graham, Simon Simon.Graham at stratus.com
Sun Sep 24 14:35:42 CEST 2006


In testing the panic removal code, I have come across a failure to
complete a resync process when the primary side is the target of the
resync; right now, if the disk state is not Negotiating then the code
will reject any attempt to perform a resync with the primary side disk
as the target even if the primary's disk is inconsistent. I came across
this with the following test case:

1. Set one side primary
2. do detach/attach on the primary side - this starts a full resync with
the secondary side as the source
3. Forcibly disconnect the network (you can actually do 'drbdadm
disconnect' on the secondary side!)
4. Reconnect the network - at this point, the resync is rejected.

Attached is some sample trace output showing the failure on the primary
side.

I'm thinking that this should be allowed IF the primary side disk is
inconsistent or diskless or otherwise bad; this means that the test in
drbd_sync_handshake:

	if (hg < 0 && 
	    mdev->state.role == Primary && mdev->state.disk !=
Negotiating ) {
		ERR("I shall become SyncTarget, but I am primary!\n");
		drbd_force_state(mdev,NS(conn,StandAlone));
		drbd_thread_stop_nowait(&mdev->receiver);
		return conn_mask;
	}

should instead be:

	if (hg < 0 && 
	    mdev->state.role == Primary && mdev->state.disk >=
Consistent ) {
		ERR("I shall become SyncTarget, but I am primary!\n");
		drbd_force_state(mdev,NS(conn,StandAlone));
		drbd_thread_stop_nowait(&mdev->receiver);
		return conn_mask;
	}

Does that make sense?
Simon

---extract of messages from Primary---

Sep 24 08:17:03 snoopy kernel: drbd0: Forcing state change from bad
state. Error would be: 'Refusing to be Primary without at least one
UpToDate disk'
Sep 24 08:17:03 snoopy kernel: drbd0:  old = { cs:WFConnection
st:Primary/Unknown ds:Inconsistent/DUnknown r--- }
Sep 24 08:17:03 snoopy kernel: drbd0:  new = { cs:WFReportParams
st:Primary/Unknown ds:Inconsistent/DUnknown r--- }
Sep 24 08:17:03 snoopy kernel: drbd0: conn( WFConnection ->
WFReportParams ) 
Sep 24 08:17:03 snoopy kernel: drbd0: data >>> HandShake (protocol 82)
Sep 24 08:17:03 snoopy kernel: drbd0: data <<< HandShake (protocol 82)
Sep 24 08:17:03 snoopy kernel: drbd0: Handshake successful: DRBD Network
Protocol version 82
Sep 24 08:17:03 snoopy kernel: drbd0: data >>> ReportProtocol (11)
Sep 24 08:17:03 snoopy kernel: drbd0: data >>> SyncParam (10)
Sep 24 08:17:03 snoopy kernel: drbd0: data >>> ReportSizes (d 15007MiB,
u 0MiB, c 15007MiB, max bio 1000, q order 0)
Sep 24 08:17:03 snoopy kernel: drbd0: data >>> ReportUUIDs
Curr:ABFA2C5E8469C059, Bitmap:0000000000000001, HisSt:980DF0A708466D72,
HisEnd:882E2BDEC183E244
Sep 24 08:17:03 snoopy kernel: drbd0: data >>> ReportState (s c861 {
role( Primary ) peer( Unknown ) conn( WFReportParams ) disk(
Inconsistent ) pdsk( DUnknown )})
Sep 24 08:17:03 snoopy kernel: drbd0: data <<< ReportProtocol (11)
Sep 24 08:17:03 snoopy kernel: drbd0: data <<< SyncParam (10)
Sep 24 08:17:03 snoopy kernel: drbd0: data <<< ReportSizes (d 15007MiB,
u 0MiB, c 15007MiB, max bio 1000, q order 0)
Sep 24 08:17:03 snoopy kernel: drbd0: data <<< ReportUUIDs
Curr:DC3A3BB9EB892584, Bitmap:ABFA2C5E8469C058, HisSt:980DF0A708466D72,
HisEnd:882E2BDEC183E244
Sep 24 08:17:03 snoopy kernel: drbd0: drbd_sync_handshake:
Sep 24 08:17:03 snoopy kernel: drbd0: self
ABFA2C5E8469C059:0000000000000001:980DF0A708466D72:882E2BDEC183E244
Sep 24 08:17:03 snoopy kernel: drbd0: peer
DC3A3BB9EB892584:ABFA2C5E8469C058:980DF0A708466D72:882E2BDEC183E244
Sep 24 08:17:03 snoopy kernel: drbd0: uuid_compare()=-1
Sep 24 08:17:03 snoopy kernel: drbd0: I shall become SyncTarget, but I
am primary!
Sep 24 08:17:03 snoopy kernel: drbd0: conn( WFReportParams -> StandAlone
) 
Sep 24 08:17:03 snoopy kernel: drbd0: error receiving ReportState, l: 4!
Sep 24 08:17:03 snoopy kernel: drbd0: asender starting cleanup




More information about the drbd-dev mailing list