Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, all On both machine, stop drbd, On machine A, issue below command, linux-10:~ # modprobe drbd linux-10:~ # drbdadm attach r2 linux-10:~ # drbdadm syncer r2 linux-10:~ # drbdadm connect r2 linux-10:~ # Then on machine B, also issue these commands, the logs on machine B are attached. It shows "Split-Brain detected". How to avoid this problem ? And how to restore it to proper status ? Thanks for you help. Best regards, Edward +++++++++++++++++++++++++++++++++++++++++++++++ linux-10:~ # cat /proc/drbd version: 8.3.0 (api:88/proto:86-89) GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by root at linux-10, 2009-02-18 16:33:17 2: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown r--- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:8 linux-10:~ # +++++++++++++++/var/log/messages+++++++++++++++++++++++++ Apr 11 10:33:05 linux-10 kernel: drbd0: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) Apr 11 10:33:05 linux-10 kernel: drbd0: short read expecting header on sock: r=-512 Apr 11 10:33:05 linux-10 kernel: drbd0: asender terminated Apr 11 10:33:05 linux-10 kernel: drbd0: Terminating asender thread Apr 11 10:33:05 linux-10 kernel: drbd0: Connection closed Apr 11 10:33:05 linux-10 kernel: drbd0: conn( Disconnecting -> StandAlone ) Apr 11 10:33:05 linux-10 kernel: drbd0: receiver terminated Apr 11 10:33:05 linux-10 kernel: drbd0: Terminating receiver thread Apr 11 10:33:05 linux-10 kernel: drbd0: disk( UpToDate -> Diskless ) Apr 11 10:33:05 linux-10 kernel: drbd0: drbd_bm_resize called with capacity == 0 Apr 11 10:33:05 linux-10 kernel: drbd0: worker terminated Apr 11 10:33:05 linux-10 kernel: drbd0: Terminating worker thread Apr 11 10:33:05 linux-10 kernel: drbd1: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) Apr 11 10:33:05 linux-10 kernel: drbd1: short read expecting header on sock: r=-512 Apr 11 10:33:05 linux-10 kernel: drbd1: asender terminated Apr 11 10:33:05 linux-10 kernel: drbd1: Terminating asender thread Apr 11 10:33:05 linux-10 kernel: drbd1: Connection closed Apr 11 10:33:05 linux-10 kernel: drbd1: conn( Disconnecting -> StandAlone ) Apr 11 10:33:05 linux-10 kernel: drbd1: receiver terminated Apr 11 10:33:05 linux-10 kernel: drbd1: Terminating receiver thread Apr 11 10:33:05 linux-10 kernel: drbd1: disk( UpToDate -> Diskless ) Apr 11 10:33:05 linux-10 kernel: drbd1: drbd_bm_resize called with capacity == 0 Apr 11 10:33:05 linux-10 kernel: drbd1: worker terminated Apr 11 10:33:05 linux-10 kernel: drbd1: Terminating worker thread Apr 11 10:33:05 linux-10 kernel: drbd2: role( Primary -> Secondary ) Apr 11 10:33:05 linux-10 kernel: drbd2: disk( UpToDate -> Diskless ) Apr 11 10:33:05 linux-10 kernel: drbd2: drbd_bm_resize called with capacity == 0 Apr 11 10:33:05 linux-10 kernel: drbd2: worker terminated Apr 11 10:33:05 linux-10 kernel: drbd2: Terminating worker thread Apr 11 10:33:05 linux-10 kernel: drbd: module cleanup done. Apr 11 10:34:48 linux-10 kernel: drbd: module not supported by Novell, setting U taint flag. Apr 11 10:34:48 linux-10 kernel: drbd: initialised. Version: 8.3.0 (api:88/proto:86-89) Apr 11 10:34:48 linux-10 kernel: drbd: GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by root at linux-10, 2009-02-18 16:33:17 Apr 11 10:34:48 linux-10 kernel: drbd: registered as block device major 147 Apr 11 10:34:48 linux-10 kernel: drbd: minor_table @ 0xffff8100740720c0 Apr 11 10:35:13 linux-10 kernel: drbd2: disk( Diskless -> Attaching ) Apr 11 10:35:13 linux-10 kernel: drbd2: Starting worker thread (from cqueue/5 [470]) Apr 11 10:35:13 linux-10 kernel: drbd2: No usable activity log found. Apr 11 10:35:13 linux-10 kernel: drbd2: Method to ensure write ordering: barrier Apr 11 10:35:13 linux-10 kernel: drbd2: max_segment_size ( = BIO size ) = 32768 Apr 11 10:35:13 linux-10 kernel: drbd2: drbd_bm_resize called with capacity == 1011928 Apr 11 10:35:13 linux-10 kernel: drbd2: resync bitmap: bits=126491 words=1977 Apr 11 10:35:13 linux-10 kernel: drbd2: size = 494 MB (505964 KB) Apr 11 10:35:13 linux-10 kernel: drbd2: recounting of set bits took additional 0 jiffies Apr 11 10:35:13 linux-10 kernel: drbd2: 8 KB (2 bits) marked out-of-sync by on disk bit-map. Apr 11 10:35:13 linux-10 kernel: drbd2: disk( Attaching -> UpToDate ) Apr 11 10:35:21 linux-10 kernel: drbd2: conn( StandAlone -> Unconnected ) Apr 11 10:35:21 linux-10 kernel: drbd2: Starting receiver thread (from drbd2_worker [27928]) Apr 11 10:35:21 linux-10 kernel: drbd2: receiver (re)started Apr 11 10:35:21 linux-10 kernel: drbd2: conn( Unconnected -> WFConnection ) Apr 11 10:35:21 linux-10 kernel: drbd2: Handshake successful: Agreed network protocol version 89 Apr 11 10:35:21 linux-10 kernel: drbd2: conn( WFConnection -> WFReportParams ) Apr 11 10:35:21 linux-10 kernel: drbd2: Starting asender thread (from drbd2_receiver [27938]) Apr 11 10:35:21 linux-10 kernel: drbd2: data-integrity-alg: <not-used> Apr 11 10:35:21 linux-10 kernel: drbd2: drbd_sync_handshake: Apr 11 10:35:21 linux-10 kernel: drbd2: self 5EB8F04153EED616:96A1102B3AB64E7E:9B3B0DF6A3761D4B:7009CE72C95D4780 Apr 11 10:35:21 linux-10 kernel: drbd2: peer 9AF90754B0369F26:96A1102B3AB64E7F:9B3B0DF6A3761D4A:7009CE72C95D4780 Apr 11 10:35:21 linux-10 kernel: drbd2: uuid_compare()=100 by rule 9 Apr 11 10:35:21 linux-10 kernel: drbd2: Split-Brain detected, dropping connection! Apr 11 10:35:21 linux-10 kernel: drbd2: self 5EB8F04153EED616:96A1102B3AB64E7E:9B3B0DF6A3761D4B:7009CE72C95D4780 Apr 11 10:35:21 linux-10 kernel: drbd2: peer 9AF90754B0369F26:96A1102B3AB64E7F:9B3B0DF6A3761D4A:7009CE72C95D4780 Apr 11 10:35:21 linux-10 kernel: drbd2: helper command: /sbin/drbdadm split-brain minor-2 Apr 11 10:35:21 linux-10 kernel: drbd2: helper command: /sbin/drbdadm split-brain minor-2 exit code 0 (0x0) Apr 11 10:35:21 linux-10 kernel: drbd2: conn( WFReportParams -> Disconnecting ) Apr 11 10:35:21 linux-10 kernel: drbd2: error receiving ReportState, l: 4! Apr 11 10:35:21 linux-10 kernel: drbd2: asender terminated Apr 11 10:35:21 linux-10 kernel: drbd2: Terminating asender thread Apr 11 10:35:21 linux-10 kernel: drbd2: Connection closed Apr 11 10:35:21 linux-10 kernel: drbd2: conn( Disconnecting -> StandAlone ) Apr 11 10:35:21 linux-10 kernel: drbd2: receiver terminated Apr 11 10:35:21 linux-10 kernel: drbd2: Terminating receiver thread ++++++++++++++ below is the result of "dmesg" +++++++++++++++++++++ drbd1: self 694BC1146C7A0476:37A938321F1078C5:8E8AEF3010DC95A9:527AC0DE800282AB drbd1: peer 37A938321F1078C4:0000000000000000:8E8AEF3010DC95A8:527AC0DE800282AB drbd0: drbd_sync_handshake: drbd0: self 6DB905A9822E817A:B777D655F75A1ABF:AE2C3E2935A148AF:3E19EFA99A907A2F drbd1: uuid_compare()=1 by rule 7 drbd0: peer B777D655F75A1ABE:0000000000000000:AE2C3E2935A148AE:3E19EFA99A907A2F drbd0: uuid_compare()=1 by rule 7 drbd2: drbd_sync_handshake: drbd2: self 5EB8F04153EED616:96A1102B3AB64E7E:9B3B0DF6A3761D4B:7009CE72C95D4780 drbd2: peer 9AF90754B0369F26:96A1102B3AB64E7F:9B3B0DF6A3761D4A:7009CE72C95D4780 drbd2: uuid_compare()=100 by rule 9 drbd2: Split-Brain detected, dropping connection! drbd2: self 5EB8F04153EED616:96A1102B3AB64E7E:9B3B0DF6A3761D4B:7009CE72C95D4780 drbd2: peer 9AF90754B0369F26:96A1102B3AB64E7F:9B3B0DF6A3761D4A:7009CE72C95D4780 drbd2: helper command: /sbin/drbdadm split-brain minor-2 drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) drbd2: meta connection shut down by peer. drbd2: conn( WFReportParams -> NetworkFailure ) drbd2: asender terminated drbd2: Terminating asender thread drbd2: helper command: /sbin/drbdadm split-brain minor-2 exit code 0 (0x0) drbd2: conn( NetworkFailure -> Disconnecting ) drbd2: error receiving ReportState, l: 4! drbd2: Connection closed drbd2: conn( Disconnecting -> StandAlone ) drbd2: receiver terminated drbd2: Terminating receiver thread drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) drbd1: Began resync as SyncSource (will sync 4 KB [1 bits set]). drbd1: Resync done (total 1 sec; paused 0 sec; 4 K/sec) drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) drbd0: Began resync as SyncSource (will sync 4 KB [1 bits set]). drbd0: Resync done (total 1 sec; paused 0 sec; 4 K/sec) drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) drbd2: role( Secondary -> Primary ) drbd0: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) drbd0: short read expecting header on sock: r=-512 drbd0: asender terminated drbd0: Terminating asender thread drbd0: Connection closed drbd0: conn( Disconnecting -> StandAlone ) drbd0: receiver terminated drbd0: Terminating receiver thread drbd0: disk( UpToDate -> Diskless ) drbd0: drbd_bm_resize called with capacity == 0 drbd0: worker terminated drbd0: Terminating worker thread drbd1: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) drbd1: short read expecting header on sock: r=-512 drbd1: asender terminated drbd1: Terminating asender thread drbd1: Connection closed drbd1: conn( Disconnecting -> StandAlone ) drbd1: receiver terminated drbd1: Terminating receiver thread drbd1: disk( UpToDate -> Diskless ) drbd1: drbd_bm_resize called with capacity == 0 drbd1: worker terminated drbd1: Terminating worker thread drbd2: role( Primary -> Secondary ) drbd2: disk( UpToDate -> Diskless ) drbd2: drbd_bm_resize called with capacity == 0 drbd2: worker terminated drbd2: Terminating worker thread drbd: module cleanup done. drbd: module not supported by Novell, setting U taint flag. drbd: initialised. Version: 8.3.0 (api:88/proto:86-89) drbd: GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by root at linux-10, 2009-02-18 16:33:17 drbd: registered as block device major 147 drbd: minor_table @ 0xffff8100740720c0 drbd2: disk( Diskless -> Attaching ) drbd2: Starting worker thread (from cqueue/5 [470]) drbd2: No usable activity log found. drbd2: Method to ensure write ordering: barrier drbd2: max_segment_size ( = BIO size ) = 32768 drbd2: drbd_bm_resize called with capacity == 1011928 drbd2: resync bitmap: bits=126491 words=1977 drbd2: size = 494 MB (505964 KB) drbd2: recounting of set bits took additional 0 jiffies drbd2: 8 KB (2 bits) marked out-of-sync by on disk bit-map. drbd2: disk( Attaching -> UpToDate ) drbd2: conn( StandAlone -> Unconnected ) drbd2: Starting receiver thread (from drbd2_worker [27928]) drbd2: receiver (re)started drbd2: conn( Unconnected -> WFConnection ) drbd2: Handshake successful: Agreed network protocol version 89 drbd2: conn( WFConnection -> WFReportParams ) drbd2: Starting asender thread (from drbd2_receiver [27938]) drbd2: data-integrity-alg: <not-used> drbd2: drbd_sync_handshake: drbd2: self 5EB8F04153EED616:96A1102B3AB64E7E:9B3B0DF6A3761D4B:7009CE72C95D4780 drbd2: peer 9AF90754B0369F26:96A1102B3AB64E7F:9B3B0DF6A3761D4A:7009CE72C95D4780 drbd2: uuid_compare()=100 by rule 9 drbd2: Split-Brain detected, dropping connection! drbd2: self 5EB8F04153EED616:96A1102B3AB64E7E:9B3B0DF6A3761D4B:7009CE72C95D4780 drbd2: peer 9AF90754B0369F26:96A1102B3AB64E7F:9B3B0DF6A3761D4A:7009CE72C95D4780 drbd2: helper command: /sbin/drbdadm split-brain minor-2 drbd2: helper command: /sbin/drbdadm split-brain minor-2 exit code 0 (0x0) drbd2: conn( WFReportParams -> Disconnecting ) drbd2: error receiving ReportState, l: 4! drbd2: asender terminated drbd2: Terminating asender thread drbd2: Connection closed drbd2: conn( Disconnecting -> StandAlone ) drbd2: receiver terminated drbd2: Terminating receiver thread linux-10:~ # > Date: Sat, 11 Apr 2009 01:52:22 +0200 > From: r.bhatia at ipax.at > To: guohuai_li at hotmail.com > CC: drbd-user at lists.linbit.com > Subject: Re: [DRBD-user] Power off caused "Unknown" status. > > On 10.04.2009 02:30, guohuai li wrote: > > > On machine B: > > > > 2: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r--- > > ns:0 nr:0 dw:4 dr:100 al:2 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:12 > > > > On machine A: > > > > 2: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown r--- > > ns:9 nr:4 dw:10 dr:200 al:1 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:8 > > did you try "drbdadm connect r2" on both nodes? what does dmesg say? > what do the logs say? > > cheers, > raoul > -- > ____________________________________________________________________ > DI (FH) Raoul Bhatia M.Sc. email. r.bhatia at ipax.at > Technischer Leiter > > IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at > Barawitzkagasse 10/2/2/11 email. office at ipax.at > 1190 Wien tel. +43 1 3670030 > FN 277995t HG Wien fax. +43 1 3670030 15 > ____________________________________________________________________ _________________________________________________________________ Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090411/8f005a40/attachment.htm>