[DRBD-user] cs:WFReportParams/cs:WFBitMapS status remains, no sync

Reinier Haasjes rhaasjes at jvc.nl
Wed Jun 10 13:58:06 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.



Lars Ellenberg wrote:
> On Tue, Jun 09, 2009 at 03:37:31PM +0200, Reinier Haasjes wrote:
>> Hi,
>>
>> Problem:
>>     cs:WFReportParams/cs:WFBitMapS status remains after changes in network. (Not syncing) (More detailed description below)
> 
>>    Configuration
>>        ha1:
>> ---------------------------------------- 
>> global {
>>     usage-count no;
>> }
>> common {
>>   syncer { rate 10M; }
>> }
>> resource shared {
>>   protocol C;
>>   handlers {
>>   }
>>   startup {
>>     degr-wfc-timeout 120;    # 2 minutes.
>>     become-primary-on both;
>>   }
>>   disk {
>>     on-io-error   pass_on;
>>   }
>>   net {
>>     allow-two-primaries;
>>     cram-hmac-alg "sha256";
>>     shared-secret "DRBD2KVM";
>>     after-sb-0pri discard-least-changes;
>>     after-sb-1pri call-pri-lost-after-sb;
>>     after-sb-2pri call-pri-lost-after-sb;
> 
> you configure it to "call-pri-lost-after-sb;".
> but you did not configure such a handler.
> you need to add such a handler,
> and that handler is expected to hard reboot the box.
> 
>>     rr-conflict disconnect;
>>   }
>> Jun  9 14:59:31 ha1 kernel: [18306.057311] drbd0: drbd_sync_handshake:
>> Jun  9 14:59:31 ha1 kernel: [18306.057336] drbd0: self 1078AA65592ABF21:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389
>> Jun  9 14:59:31 ha1 kernel: [18306.057372] drbd0: peer 06A5AF91DACA3D7F:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389
>> Jun  9 14:59:31 ha1 kernel: [18306.057408] drbd0: uuid_compare()=100 by rule 9
> 
> 
>> Jun  9 14:59:31 ha2 kernel: [18325.129642] drbd0: drbd_sync_handshake:
>> Jun  9 14:59:31 ha2 kernel: [18325.129667] drbd0: self 06A5AF91DACA3D7F:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389
>> Jun  9 14:59:31 ha2 kernel: [18325.129717] drbd0: peer 1078AA65592ABF21:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389
>> Jun  9 14:59:31 ha2 kernel: [18325.129752] drbd0: uuid_compare()=100 by rule 9
>> Jun  9 14:59:31 ha2 kernel: [18325.129774] drbd0: Split-Brain detected, 2 primaries, automatically solved. Sync from this node
>> Jun  9 14:59:31 ha2 kernel: [18325.129845] drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) 
> 
> hmmm. too bad.
> the combination of after-sb- configuration you chose
> leads into a deadlock within the drbd state handling on ha1 :(
> we'll fix that.

What after-sb- configuration do you advise on a 2 box, multipath 
configuration (both boxes are primary and used as a primary)?

> 
> what would have been intended for this combination of after-sb-
> configuration is ha1 tries to become Secondary,
> if that succeeds it becomes normal SyncTarget.
> if that does not succeed, it would call the lost-after-sb handler,
> which you need to define, and which is supposed to hard reboot the box.
> 
> because that handler is not configured, but the rr-conflict is on the
> default disconnect, you'd then get on ha1 "I shall become SyncTarget,
> but I am primary!", and ha1 would go StandAlone.
> 

I changed my config a little (rr-conflict and pri-lost-after-sb) and 
did the test again, same results:

---------------------------------------- 
root at ha2:/etc# drbdadm dump
# /etc/drbd.conf
common {
    syncer {
        rate             10M;
    }
}

# resource shared on ha2: not ignored, not stacked
resource shared {
    protocol               C;
    on ha2 {
        device           /dev/drbd0;
        disk             /dev/data/shared;
        address          ipv4 192.168.0.2:7788;
        meta-disk        internal;
    }
    on ha1 {
        device           /dev/drbd0;
        disk             /dev/data/shared;
        address          ipv4 192.168.0.1:7788;
        meta-disk        internal;
    }
    net {
        allow-two-primaries;
        cram-hmac-alg    sha256;
        shared-secret    DRBD2KVM;
        after-sb-0pri    discard-least-changes;
        after-sb-1pri    call-pri-lost-after-sb;
        after-sb-2pri    call-pri-lost-after-sb;
        rr-conflict      call-pri-lost;
    }
    disk {
        on-io-error      pass_on;
    }
    syncer {
        rate             300M;
        al-extents       257;
    }
    startup {
        degr-wfc-timeout 120;
        become-primary-on both;
    }
    handlers {
        pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
    }
}
---------------------------------------- 

  /var/log/syslog
---------------------------------------- 
Jun 10 13:51:04 ha1 kernel: [100598.802853] drbd0: Handshake successful: Agreed network protocol version 89
Jun 10 13:51:04 ha1 kernel: [100598.802853] drbd0: Peer authenticated using 32 bytes of 'sha256' HMAC
Jun 10 13:51:04 ha1 kernel: [100598.802853] drbd0: conn( WFConnection -> WFReportParams ) 
Jun 10 13:51:04 ha1 kernel: [100598.802853] drbd0: Starting asender thread (from drbd0_receiver [21285])
Jun 10 13:51:04 ha1 kernel: [100598.804151] drbd0: data-integrity-alg: <not-used>
Jun 10 13:51:04 ha1 kernel: [100598.805695] drbd0: drbd_sync_handshake:
Jun 10 13:51:04 ha1 kernel: [100598.805736] drbd0: self E918DF4E7E8DF7EF:AD94E6470886C27D:B9557645A64BEF85:2D044E354B958893
Jun 10 13:51:04 ha1 kernel: [100598.805773] drbd0: peer A34845B054C4CA3D:AD94E6470886C27D:B9557645A64BEF84:2D044E354B958893
Jun 10 13:51:04 ha1 kernel: [100598.805846] drbd0: uuid_compare()=100 by rule 9
---------------------------------------- 

   /proc/drbd
	ha1:
---------------------------------------- 
version: 8.3.0 (api:88/proto:86-89)
GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by phil at fat-tyre, 2008-12-18 15:26:13
 0: cs:WFReportParams ro:Primary/Unknown ds:UpToDate/DUnknown C r---
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:1 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
---------------------------------------- 
	ha2:
---------------------------------------- 
version: 8.3.0 (api:88/proto:86-89)
GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by phil at fat-tyre, 2008-12-18 15:26:13
 0: cs:WFBitMapS ro:Primary/Primary ds:UpToDate/UpToDate C r---
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
---------------------------------------- 

Thanks,

Reinier
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 2485 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090610/7b89928c/attachment.bin>


More information about the drbd-user mailing list