Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Lars Ellenberg wrote:
> On Tue, Jun 09, 2009 at 03:37:31PM +0200, Reinier Haasjes wrote:
>> Hi,
>>
>> Problem:
>> cs:WFReportParams/cs:WFBitMapS status remains after changes in network. (Not syncing) (More detailed description below)
>
>> Configuration
>> ha1:
>> ----------------------------------------
>> global {
>> usage-count no;
>> }
>> common {
>> syncer { rate 10M; }
>> }
>> resource shared {
>> protocol C;
>> handlers {
>> }
>> startup {
>> degr-wfc-timeout 120; # 2 minutes.
>> become-primary-on both;
>> }
>> disk {
>> on-io-error pass_on;
>> }
>> net {
>> allow-two-primaries;
>> cram-hmac-alg "sha256";
>> shared-secret "DRBD2KVM";
>> after-sb-0pri discard-least-changes;
>> after-sb-1pri call-pri-lost-after-sb;
>> after-sb-2pri call-pri-lost-after-sb;
>
> you configure it to "call-pri-lost-after-sb;".
> but you did not configure such a handler.
> you need to add such a handler,
> and that handler is expected to hard reboot the box.
>
>> rr-conflict disconnect;
>> }
>> Jun 9 14:59:31 ha1 kernel: [18306.057311] drbd0: drbd_sync_handshake:
>> Jun 9 14:59:31 ha1 kernel: [18306.057336] drbd0: self 1078AA65592ABF21:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389
>> Jun 9 14:59:31 ha1 kernel: [18306.057372] drbd0: peer 06A5AF91DACA3D7F:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389
>> Jun 9 14:59:31 ha1 kernel: [18306.057408] drbd0: uuid_compare()=100 by rule 9
>
>
>> Jun 9 14:59:31 ha2 kernel: [18325.129642] drbd0: drbd_sync_handshake:
>> Jun 9 14:59:31 ha2 kernel: [18325.129667] drbd0: self 06A5AF91DACA3D7F:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389
>> Jun 9 14:59:31 ha2 kernel: [18325.129717] drbd0: peer 1078AA65592ABF21:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389
>> Jun 9 14:59:31 ha2 kernel: [18325.129752] drbd0: uuid_compare()=100 by rule 9
>> Jun 9 14:59:31 ha2 kernel: [18325.129774] drbd0: Split-Brain detected, 2 primaries, automatically solved. Sync from this node
>> Jun 9 14:59:31 ha2 kernel: [18325.129845] drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
>
> hmmm. too bad.
> the combination of after-sb- configuration you chose
> leads into a deadlock within the drbd state handling on ha1 :(
> we'll fix that.
What after-sb- configuration do you advise on a 2 box, multipath
configuration (both boxes are primary and used as a primary)?
>
> what would have been intended for this combination of after-sb-
> configuration is ha1 tries to become Secondary,
> if that succeeds it becomes normal SyncTarget.
> if that does not succeed, it would call the lost-after-sb handler,
> which you need to define, and which is supposed to hard reboot the box.
>
> because that handler is not configured, but the rr-conflict is on the
> default disconnect, you'd then get on ha1 "I shall become SyncTarget,
> but I am primary!", and ha1 would go StandAlone.
>
I changed my config a little (rr-conflict and pri-lost-after-sb) and
did the test again, same results:
----------------------------------------
root at ha2:/etc# drbdadm dump
# /etc/drbd.conf
common {
syncer {
rate 10M;
}
}
# resource shared on ha2: not ignored, not stacked
resource shared {
protocol C;
on ha2 {
device /dev/drbd0;
disk /dev/data/shared;
address ipv4 192.168.0.2:7788;
meta-disk internal;
}
on ha1 {
device /dev/drbd0;
disk /dev/data/shared;
address ipv4 192.168.0.1:7788;
meta-disk internal;
}
net {
allow-two-primaries;
cram-hmac-alg sha256;
shared-secret DRBD2KVM;
after-sb-0pri discard-least-changes;
after-sb-1pri call-pri-lost-after-sb;
after-sb-2pri call-pri-lost-after-sb;
rr-conflict call-pri-lost;
}
disk {
on-io-error pass_on;
}
syncer {
rate 300M;
al-extents 257;
}
startup {
degr-wfc-timeout 120;
become-primary-on both;
}
handlers {
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
}
}
----------------------------------------
/var/log/syslog
----------------------------------------
Jun 10 13:51:04 ha1 kernel: [100598.802853] drbd0: Handshake successful: Agreed network protocol version 89
Jun 10 13:51:04 ha1 kernel: [100598.802853] drbd0: Peer authenticated using 32 bytes of 'sha256' HMAC
Jun 10 13:51:04 ha1 kernel: [100598.802853] drbd0: conn( WFConnection -> WFReportParams )
Jun 10 13:51:04 ha1 kernel: [100598.802853] drbd0: Starting asender thread (from drbd0_receiver [21285])
Jun 10 13:51:04 ha1 kernel: [100598.804151] drbd0: data-integrity-alg: <not-used>
Jun 10 13:51:04 ha1 kernel: [100598.805695] drbd0: drbd_sync_handshake:
Jun 10 13:51:04 ha1 kernel: [100598.805736] drbd0: self E918DF4E7E8DF7EF:AD94E6470886C27D:B9557645A64BEF85:2D044E354B958893
Jun 10 13:51:04 ha1 kernel: [100598.805773] drbd0: peer A34845B054C4CA3D:AD94E6470886C27D:B9557645A64BEF84:2D044E354B958893
Jun 10 13:51:04 ha1 kernel: [100598.805846] drbd0: uuid_compare()=100 by rule 9
----------------------------------------
/proc/drbd
ha1:
----------------------------------------
version: 8.3.0 (api:88/proto:86-89)
GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by phil at fat-tyre, 2008-12-18 15:26:13
0: cs:WFReportParams ro:Primary/Unknown ds:UpToDate/DUnknown C r---
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:1 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
----------------------------------------
ha2:
----------------------------------------
version: 8.3.0 (api:88/proto:86-89)
GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by phil at fat-tyre, 2008-12-18 15:26:13
0: cs:WFBitMapS ro:Primary/Primary ds:UpToDate/UpToDate C r---
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
----------------------------------------
Thanks,
Reinier
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 2485 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090610/7b89928c/attachment.bin>