[DRBD-user] cs:WFReportParams/cs:WFBitMapS status remains, no sync

Lars Ellenberg lars.ellenberg at linbit.com
Wed Jun 10 13:20:43 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Jun 09, 2009 at 03:37:31PM +0200, Reinier Haasjes wrote:
> Hi,
> 
> Problem:
>     cs:WFReportParams/cs:WFBitMapS status remains after changes in network. (Not syncing) (More detailed description below)

>    Configuration
>        ha1:
> ---------------------------------------- 
> global {
>     usage-count no;
> }
> common {
>   syncer { rate 10M; }
> }
> resource shared {
>   protocol C;
>   handlers {
>   }
>   startup {
>     degr-wfc-timeout 120;    # 2 minutes.
>     become-primary-on both;
>   }
>   disk {
>     on-io-error   pass_on;
>   }
>   net {
>     allow-two-primaries;
>     cram-hmac-alg "sha256";
>     shared-secret "DRBD2KVM";
>     after-sb-0pri discard-least-changes;
>     after-sb-1pri call-pri-lost-after-sb;
>     after-sb-2pri call-pri-lost-after-sb;

you configure it to "call-pri-lost-after-sb;".
but you did not configure such a handler.
you need to add such a handler,
and that handler is expected to hard reboot the box.

>     rr-conflict disconnect;
>   }
> Jun  9 14:59:31 ha1 kernel: [18306.057311] drbd0: drbd_sync_handshake:
> Jun  9 14:59:31 ha1 kernel: [18306.057336] drbd0: self 1078AA65592ABF21:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389
> Jun  9 14:59:31 ha1 kernel: [18306.057372] drbd0: peer 06A5AF91DACA3D7F:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389
> Jun  9 14:59:31 ha1 kernel: [18306.057408] drbd0: uuid_compare()=100 by rule 9


> Jun  9 14:59:31 ha2 kernel: [18325.129642] drbd0: drbd_sync_handshake:
> Jun  9 14:59:31 ha2 kernel: [18325.129667] drbd0: self 06A5AF91DACA3D7F:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389
> Jun  9 14:59:31 ha2 kernel: [18325.129717] drbd0: peer 1078AA65592ABF21:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389
> Jun  9 14:59:31 ha2 kernel: [18325.129752] drbd0: uuid_compare()=100 by rule 9
> Jun  9 14:59:31 ha2 kernel: [18325.129774] drbd0: Split-Brain detected, 2 primaries, automatically solved. Sync from this node
> Jun  9 14:59:31 ha2 kernel: [18325.129845] drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) 

hmmm. too bad.
the combination of after-sb- configuration you chose
leads into a deadlock within the drbd state handling on ha1 :(
we'll fix that.

what would have been intended for this combination of after-sb-
configuration is ha1 tries to become Secondary,
if that succeeds it becomes normal SyncTarget.
if that does not succeed, it would call the lost-after-sb handler,
which you need to define, and which is supposed to hard reboot the box.

because that handler is not configured, but the rr-conflict is on the
default disconnect, you'd then get on ha1 "I shall become SyncTarget,
but I am primary!", and ha1 would go StandAlone.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list