[DRBD-user] Split Brain, help!

Ben Clewett ben at roadrunner.uk.com
Thu Aug 16 13:10:38 CEST 2007



Dead DRBD,

I have never had a problem with DRBD, 8.0.3, until now.

This happened at the same time as I upgraded linux-ha from 2.0.8 to 2.1.2.

Full log attached, this is an exert:

Aug 16 11:52:18 drbd1: Handshake successful: DRBD Network Protocol 
version 86
Aug 16 11:52:18 drbd1: Split-Brain detected, dropping connection!
Aug 16 11:52:18 drbd1: self 
BEDB966CA82D9A19:48FB01456BF4AD01:808C51F368210CB6:32670C5EB0E5BA83
Aug 16 11:52:18 drbd1: peer 
0E52E3C9B428750B:48FB01456BF4AD00:808C51F368210CB6:32670C5EB0E5BA83
Aug 16 11:52:18 drbd1: conn( WFReportParams -> Disconnecting )
Aug 16 11:52:18 drbd1: error receiving ReportState, l: 4!
Aug 16 11:52:18 drbd1: meta connection shut down by peer.
Aug 16 11:52:18 drbd1: asender terminated
Aug 16 11:52:18 drbd1: tl_clear()
Aug 16 11:52:18 drbd1: Connection closed
Aug 16 11:52:18 drbd1: conn( Disconnecting -> StandAlone )

----------

I understand from other postings to this mailing list that I have to 
sacrafice a disk using:

root at bad-data# drbdadm -- --discard-my-data connect all
root at good-data# drbdadm connect all

But this is very worrying.  I would loose my customers data.  I want to 
understand how this happens, and how I can avoid this ever happening again.

Also worring was that linux-ha, using the DRBD supplied heartbeat 
complient script, reported DRBD as up.  Therefore DRBD mounded the bad 
disk and allowed data to write to is.

Any suggestions are very welcome!

Regards,

Ben






*************************************************************************
This e-mail is confidential and may be legally privileged. It is intended
solely for the use of the individual(s) to whom it is addressed. Any
content in this message is not necessarily a view or statement from Road
Tech Computer Systems Limited but is that of the individual sender. If
you are not the intended recipient, be advised that you have received
this e-mail in error and that any use, dissemination, forwarding,
printing, or copying of this e-mail is strictly prohibited. We use
reasonable endeavours to virus scan all e-mails leaving the company but
no warranty is given that this e-mail and any attachments are virus free.
You should undertake your own virus checking. The right to monitor e-mail
communications through our networks is reserved by us

  Road Tech Computer Systems Ltd. Shenley Hall, Rectory Lane, Shenley,
  Radlett, Hertfordshire, WD7 9AN. - VAT Registration No GB 449 3582 17
  Registered in England No: 02017435, Registered Address: Charter Court, 
  Midland Road, Hemel Hempstead,  Hertfordshire, HP2 5GE. 
*************************************************************************
-------------- next part --------------
Aug 16 11:52:18 hp-tm-02 kernel: drbd: module not supported by Novell, setting U taint flag.
Aug 16 11:52:18 hp-tm-02 kernel: drbd: initialised. Version: 8.0.3 (api:86/proto:86)
Aug 16 11:52:18 hp-tm-02 kernel: drbd: SVN Revision: 2881 build by root at hp-tm-02, 2007-05-15 10:38:31
Aug 16 11:52:18 hp-tm-02 kernel: drbd: registered as block device major 147
Aug 16 11:52:18 hp-tm-02 kernel: drbd: minor_table @ 0xcdbbf4a0
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: disk( Diskless -> Attaching ) 
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: Found 4 transactions (116 active extents) in activity log.
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: max_segment_size ( = BIO size ) = 32768
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: drbd_bm_resize called with capacity == 20980827
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: resync bitmap: bits=2622604 words=81958
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: size = 10 GB (10490413 KB)
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: reading of bitmap took 4 jiffies
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: recounting of set bits took additional 1 jiffies
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: 436 MB marked out-of-sync by on disk bit-map.
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: Marked additional 0 KB as out-of-sync based on AL.
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: disk( Attaching -> UpToDate ) 
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: Writing meta data super block now.
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: disk( Diskless -> Attaching ) 
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: Found 4 transactions (192 active extents) in activity log.
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: max_segment_size ( = BIO size ) = 32768
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: drbd_bm_resize called with capacity == 20980827
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: resync bitmap: bits=2622604 words=81958
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: size = 10 GB (10490413 KB)
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: reading of bitmap took 3 jiffies
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: recounting of set bits took additional 1 jiffies
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: 508 MB marked out-of-sync by on disk bit-map.
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: Marked additional 0 KB as out-of-sync based on AL.
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: disk( Attaching -> UpToDate ) 
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: Writing meta data super block now.
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: conn( StandAlone -> Unconnected ) 
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: receiver (re)started
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: conn( Unconnected -> WFConnection ) 
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: conn( WFConnection -> WFReportParams ) 
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: Handshake successful: DRBD Network Protocol version 86
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: Split-Brain detected, dropping connection!
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: self 58C8564200DBEEFF:4AB30FD676FC4813:778EEC9DC4FF4883:465A92C897CE7AFD
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: peer 3594ADC0AE200F29:4AB30FD676FC4812:778EEC9DC4FF4882:465A92C897CE7AFD
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: conn( WFReportParams -> Disconnecting ) 
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: error receiving ReportState, l: 4!
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: meta connection shut down by peer.
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: asender terminated
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: tl_clear()
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: Connection closed
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: conn( Disconnecting -> StandAlone ) 
Aug 16 11:52:18 hp-tm-02 kernel: drbd0: receiver terminated
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: conn( StandAlone -> Unconnected ) 
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: receiver (re)started
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: conn( Unconnected -> WFConnection ) 
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: conn( WFConnection -> WFReportParams ) 
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: Handshake successful: DRBD Network Protocol version 86
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: Split-Brain detected, dropping connection!
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: self BEDB966CA82D9A19:48FB01456BF4AD01:808C51F368210CB6:32670C5EB0E5BA83
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: peer 0E52E3C9B428750B:48FB01456BF4AD00:808C51F368210CB6:32670C5EB0E5BA83
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: conn( WFReportParams -> Disconnecting ) 
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: error receiving ReportState, l: 4!
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: meta connection shut down by peer.
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: asender terminated
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: tl_clear()
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: Connection closed
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: conn( Disconnecting -> StandAlone ) 
Aug 16 11:52:18 hp-tm-02 kernel: drbd1: receiver terminated


More information about the drbd-user mailing list