Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Lars Ellenberg wrote: > On Tue, Jun 09, 2009 at 03:37:31PM +0200, Reinier Haasjes wrote: >> Hi, >> >> Problem: >> cs:WFReportParams/cs:WFBitMapS status remains after changes in network. (Not syncing) (More detailed description below) > >> Configuration >> ha1: >> ---------------------------------------- >> global { >> usage-count no; >> } >> common { >> syncer { rate 10M; } >> } >> resource shared { >> protocol C; >> handlers { >> } >> startup { >> degr-wfc-timeout 120; # 2 minutes. >> become-primary-on both; >> } >> disk { >> on-io-error pass_on; >> } >> net { >> allow-two-primaries; >> cram-hmac-alg "sha256"; >> shared-secret "DRBD2KVM"; >> after-sb-0pri discard-least-changes; >> after-sb-1pri call-pri-lost-after-sb; >> after-sb-2pri call-pri-lost-after-sb; > > you configure it to "call-pri-lost-after-sb;". > but you did not configure such a handler. > you need to add such a handler, > and that handler is expected to hard reboot the box. > >> rr-conflict disconnect; >> } >> Jun 9 14:59:31 ha1 kernel: [18306.057311] drbd0: drbd_sync_handshake: >> Jun 9 14:59:31 ha1 kernel: [18306.057336] drbd0: self 1078AA65592ABF21:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389 >> Jun 9 14:59:31 ha1 kernel: [18306.057372] drbd0: peer 06A5AF91DACA3D7F:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389 >> Jun 9 14:59:31 ha1 kernel: [18306.057408] drbd0: uuid_compare()=100 by rule 9 > > >> Jun 9 14:59:31 ha2 kernel: [18325.129642] drbd0: drbd_sync_handshake: >> Jun 9 14:59:31 ha2 kernel: [18325.129667] drbd0: self 06A5AF91DACA3D7F:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389 >> Jun 9 14:59:31 ha2 kernel: [18325.129717] drbd0: peer 1078AA65592ABF21:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389 >> Jun 9 14:59:31 ha2 kernel: [18325.129752] drbd0: uuid_compare()=100 by rule 9 >> Jun 9 14:59:31 ha2 kernel: [18325.129774] drbd0: Split-Brain detected, 2 primaries, automatically solved. Sync from this node >> Jun 9 14:59:31 ha2 kernel: [18325.129845] drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) > > hmmm. too bad. > the combination of after-sb- configuration you chose > leads into a deadlock within the drbd state handling on ha1 :( > we'll fix that. What after-sb- configuration do you advise on a 2 box, multipath configuration (both boxes are primary and used as a primary)? > > what would have been intended for this combination of after-sb- > configuration is ha1 tries to become Secondary, > if that succeeds it becomes normal SyncTarget. > if that does not succeed, it would call the lost-after-sb handler, > which you need to define, and which is supposed to hard reboot the box. > > because that handler is not configured, but the rr-conflict is on the > default disconnect, you'd then get on ha1 "I shall become SyncTarget, > but I am primary!", and ha1 would go StandAlone. > I changed my config a little (rr-conflict and pri-lost-after-sb) and did the test again, same results: ---------------------------------------- root at ha2:/etc# drbdadm dump # /etc/drbd.conf common { syncer { rate 10M; } } # resource shared on ha2: not ignored, not stacked resource shared { protocol C; on ha2 { device /dev/drbd0; disk /dev/data/shared; address ipv4 192.168.0.2:7788; meta-disk internal; } on ha1 { device /dev/drbd0; disk /dev/data/shared; address ipv4 192.168.0.1:7788; meta-disk internal; } net { allow-two-primaries; cram-hmac-alg sha256; shared-secret DRBD2KVM; after-sb-0pri discard-least-changes; after-sb-1pri call-pri-lost-after-sb; after-sb-2pri call-pri-lost-after-sb; rr-conflict call-pri-lost; } disk { on-io-error pass_on; } syncer { rate 300M; al-extents 257; } startup { degr-wfc-timeout 120; become-primary-on both; } handlers { pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; } } ---------------------------------------- /var/log/syslog ---------------------------------------- Jun 10 13:51:04 ha1 kernel: [100598.802853] drbd0: Handshake successful: Agreed network protocol version 89 Jun 10 13:51:04 ha1 kernel: [100598.802853] drbd0: Peer authenticated using 32 bytes of 'sha256' HMAC Jun 10 13:51:04 ha1 kernel: [100598.802853] drbd0: conn( WFConnection -> WFReportParams ) Jun 10 13:51:04 ha1 kernel: [100598.802853] drbd0: Starting asender thread (from drbd0_receiver [21285]) Jun 10 13:51:04 ha1 kernel: [100598.804151] drbd0: data-integrity-alg: <not-used> Jun 10 13:51:04 ha1 kernel: [100598.805695] drbd0: drbd_sync_handshake: Jun 10 13:51:04 ha1 kernel: [100598.805736] drbd0: self E918DF4E7E8DF7EF:AD94E6470886C27D:B9557645A64BEF85:2D044E354B958893 Jun 10 13:51:04 ha1 kernel: [100598.805773] drbd0: peer A34845B054C4CA3D:AD94E6470886C27D:B9557645A64BEF84:2D044E354B958893 Jun 10 13:51:04 ha1 kernel: [100598.805846] drbd0: uuid_compare()=100 by rule 9 ---------------------------------------- /proc/drbd ha1: ---------------------------------------- version: 8.3.0 (api:88/proto:86-89) GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by phil at fat-tyre, 2008-12-18 15:26:13 0: cs:WFReportParams ro:Primary/Unknown ds:UpToDate/DUnknown C r--- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:1 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 ---------------------------------------- ha2: ---------------------------------------- version: 8.3.0 (api:88/proto:86-89) GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by phil at fat-tyre, 2008-12-18 15:26:13 0: cs:WFBitMapS ro:Primary/Primary ds:UpToDate/UpToDate C r--- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 ---------------------------------------- Thanks, Reinier -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 2485 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090610/7b89928c/attachment.bin>