[Drbd-dev] DRBD8: Node state stuck in :WFBitMapT on both node.

Montrose, Ernest Ernest.Montrose at stratus.com
Fri Jan 5 21:33:31 CET 2007


Under a xen migration test with moderate I/O we easily get into this
state where both nodes enters: WFBitMapT.
We can recover from this by rebooting one node.  This is broken by som
recent changes.  I can say that it worked 
Before  these changes:
Broken with these changes:
------------------------------------------------------------------------
r8476 | emontros | 2007-01-03 11:01:31 -0500 (Wed, 03 Jan 2007) | 62
lines

Merged revisions 8360-8456 via svnmerge from
http://svn/vendor/drbd-8/trunk <http://svn/vendor/drbd-8/trunk> 

........
r8408 | sgraham | 2006-12-28 13:53:59 -0500 (Thu, 28 Dec 2006) | 38
lines

[DRBD-8 @ 2648]
With this patch I try to make DRBD to send more big packets,
and less small packets. Somethimes I experienced that enabling
Jumbo frames decreased DRBD's preformance. Altough, I could
not reproduce this at time of writing the patch, I managed to
measure the distribution of the packet's sizes. This indicates
that the patch has the intended effect:

Distribution of packet sizen when writing 512MB of zeros:

      pkt          old-code        new-code
    sizes      send    recv  send    recv
    ----------  ------------------------------
    1 to  450:  42047  42259  38629  38784
  451 to  900:  4057    4230    2023    2176
  901 to 1350:    112    2919      54    2289
  1351 to 1800:  6697    7439    6698    7368
  1801 to 2250:    11      43      4      8
  2251 to 2700:  1643    1684    1838    2307
  2701 to 3150:      5      43      3      18
  3151 to 3600:      2      70      0      51
  3601 to 4050:    95    218      2    251
  4051 to 4500:  3159    3296      1      58
  4501 to 4950:    45    270      6    176
  4951 to 5400:    75    717      60    453
  5401 to 5850:    867    901    1376    1383
  5851 to 6300:    683    696    245    251
  6301 to 6750:    129    214      44      47
  6751 to 7200:    24      39    266    274
  7201 to 7650:    101    118    256    263
  7651 to 8100:    30    1829      70    2109
  8101 to 8550:    36      48      49      56
  8551 to 9000+: 14971  53657  15755  54992

-Phil

Original author: phil
Date: 2006-12-28 12:07:12.682853+00:00
........
r8409 | sgraham | 2006-12-28 13:54:04 -0500 (Thu, 28 Dec 2006) | 5 lines

[DRBD-8 @ 2649]
Make DRBD compile even on SLES9 (Kernel 2.6.5)

Original author: phil
Date: 2006-12-28 15:54:28.853546+00:00
........
r8455 | sgraham | 2007-01-02 09:04:11 -0500 (Tue, 02 Jan 2007) | 5 lines

[DRBD-8 @ 2650]
Fixed a few error strings...

Original author: phil
Date: 2007-01-02 13:47:28.435731+00:00
........

------------------------------------------------------------------------
 
Here the state traces:
 
Here is the log for JErry:
drbd2: disk( Diskless -> Attaching )
drbd2: Found 6 transactions (267 active extents) in activity log.
drbd2: max_segment_size ( = BIO size ) = 32768
drbd2: drbd_bm_resize called with capacity == 30735368
drbd2: resync bitmap: bits=3841921 words=120062
drbd2: size = 14 GB (15367684 KB)
drbd2: reading of bitmap took 18 jiffies
drbd2: recounting of set bits took additional 1 jiffies
drbd2: 0 KB marked out-of-sync by on disk bit-map.
drbd2: disk( Attaching -> UpToDate )
drbd2: Writing meta data super block now.
drbd2: conn( StandAlone -> Unconnected )
drbd2: receiver (re)started
drbd2: conn( Unconnected -> WFConnection )
drbd2: conn( WFConnection -> WFReportParams )
drbd2: Handshake successful: DRBD Network Protocol version 85
drbd2: drbd_sync_handshake:
drbd2: self
1999C2A90658DA42:0000000000000000:30A5702D832E1702:0000000000000004
drbd2: peer
1999C2A90658DA43:0000000000000000:30A5702D832E1702:0000000000000004
drbd2: uuid_compare()=-1
drbd2:  uuid[History_start] now 1999C2A90658DA42
drbd2:  uuid[Current] now 0000000000000000
drbd2: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT )
pdsk( DUnknown -> UpToDate )
drbd2: Writing meta data super block now.
drbd2: aftr_isp( 0 -> 1 )
drbd2: conn( WFBitMapT -> WFSyncUUID )
drbd2: peer_isp( 0 -> 1 )
drbd2:  uuid[Current] now FC968953A6746EEC
drbd2:  uuid[Bitmap] now 0000000000000000
drbd2: conn( WFSyncUUID -> PausedSyncT ) disk( UpToDate -> Inconsistent
)
drbd2: Began resync as PausedSyncT (will sync 1052672 KB [263168 bits
set]).
drbd2: Writing meta data super block now.
drbd2: peer( Secondary -> Primary )
drbd2: peer_isp( 1 -> 0 )
drbd2: conn( PausedSyncT -> SyncTarget ) aftr_isp( 1 -> 0 )
drbd2: Syncer continues.
drbd2: Resync done (total 225 sec; paused 61 sec; 6416 K/sec)
drbd2:  uuid[Bitmap] now 0000000000000000
drbd2:  uuid[History_start] now 30A5702D832E1702
drbd2:  uuid[History_end] now 0000000000000004
drbd2:  uuid[Bitmap] now FC968953A6746EEC
drbd2:  uuid[Current] now 1999C2A90658DA42
drbd2:  uuid[History_start] now FC968953A6746EEC
drbd2:  uuid[Bitmap] now 0000000000000000
drbd2: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
drbd2: Writing meta data super block now.
drbd2: drbd_open: Allowing open in Secondary state
drbd2: drbd_sync_handshake:
drbd2: self
1999C2A90658DA42:0000000000000000:FC968953A6746EEC:30A5702D832E1702
drbd2: peer
1999C2A90658DA42:0000000000000000:FC968953A6746EEC:30A5702D832E1702
drbd2: uuid_compare()=-1
drbd2:  uuid[History_start] now 1999C2A90658DA42
drbd2:  uuid[Current] now 0000000000000000
drbd2: peer( Primary -> Secondary ) conn( Connected -> WFBitMapT )
drbd2: Writing meta data super block now.
drbd2: role( Secondary -> Primary )
drbd2: Writing meta data super block now.


On ben the other node we have:

drbd2: disk( Diskless -> Attaching )
drbd2: Found 6 transactions (324 active extents) in activity log.
drbd2: max_segment_size ( = BIO size ) = 32768
drbd2: drbd_bm_resize called with capacity == 30735368
drbd2: resync bitmap: bits=3841921 words=120062
drbd2: size = 14 GB (15367684 KB)
drbd2: reading of bitmap took 26 jiffies
drbd2: recounting of set bits took additional 1 jiffies
drbd2: 0 KB marked out-of-sync by on disk bit-map.
drbd2: Marked additional 1028 MB as out-of-sync based on AL.
drbd2: disk( Attaching -> UpToDate )
drbd2: Writing meta data super block now.
drbd2: conn( StandAlone -> Unconnected )
drbd2: receiver (re)started
drbd2: conn( Unconnected -> WFConnection )
drbd2: conn( WFConnection -> WFReportParams )
drbd2: Handshake successful: DRBD Network Protocol version 85
drbd2: drbd_sync_handshake:
drbd2: self
1999C2A90658DA43:0000000000000000:30A5702D832E1702:0000000000000004
drbd2: peer
1999C2A90658DA42:0000000000000000:30A5702D832E1702:0000000000000004
drbd2: uuid_compare()=1
drbd2: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )
pdsk( DUnknown -> UpToDate )
drbd2: aftr_isp( 0 -> 1 )
drbd2: Writing meta data super block now.
drbd2: peer_isp( 0 -> 1 )
drbd2:  uuid[Bitmap] now FC968953A6746EEC
drbd2: conn( WFBitMapS -> PausedSyncS ) pdsk( UpToDate -> Inconsistent )
drbd2: Began resync as PausedSyncS (will sync 1052672 KB [263168 bits
set]).
drbd2: Writing meta data super block now.
drbd2: role( Secondary -> Primary )
drbd2: Writing meta data super block now.
drbd2: aftr_isp( 1 -> 0 )
drbd2: conn( PausedSyncS -> SyncSource ) peer_isp( 1 -> 0 )
drbd2: Syncer continues.
drbd2: Resync done (total 225 sec; paused 61 sec; 6416 K/sec)
drbd2:  uuid[History_start] now FC968953A6746EEC
drbd2:  uuid[Bitmap] now 0000000000000000
drbd2: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
drbd2: Writing meta data super block now.
drbd2: role( Primary -> Secondary )
drbd2: Writing meta data super block now.
drbd2: drbd_sync_handshake:
drbd2: self
1999C2A90658DA43:0000000000000000:FC968953A6746EEC:30A5702D832E1702
drbd2: peer
0000000000000000:0000000000000000:1999C2A90658DA42:FC968953A6746EEC
drbd2: uuid_compare()=-2
drbd2: Writing meta data super block now.
drbd2: writing of bitmap took 59 jiffies
drbd2: 14 GB marked out-of-sync by on disk bit-map.
drbd2: 15367684 KB now marked out-of-sync by on disk bit-map.
drbd2: Writing meta data super block now.
drbd2:  uuid[History_start] now 1999C2A90658DA43
drbd2:  uuid[Current] now 0000000000000000
drbd2: conn( Connected -> WFBitMapT )
drbd2: Writing meta data super block now.
drbd2: peer( Secondary -> Primary )
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.linbit.com/pipermail/drbd-dev/attachments/20070105/5ec55fa8/attachment-0001.htm


More information about the drbd-dev mailing list