[Drbd-dev] DRBD8: Node state stuck in :WFBitMapT on both node.
Montrose, Ernest
Ernest.Montrose at stratus.com
Fri Jan 5 21:33:31 CET 2007
Under a xen migration test with moderate I/O we easily get into this
state where both nodes enters: WFBitMapT.
We can recover from this by rebooting one node. This is broken by som
recent changes. I can say that it worked
Before these changes:
Broken with these changes:
------------------------------------------------------------------------
r8476 | emontros | 2007-01-03 11:01:31 -0500 (Wed, 03 Jan 2007) | 62
lines
Merged revisions 8360-8456 via svnmerge from
http://svn/vendor/drbd-8/trunk <http://svn/vendor/drbd-8/trunk>
........
r8408 | sgraham | 2006-12-28 13:53:59 -0500 (Thu, 28 Dec 2006) | 38
lines
[DRBD-8 @ 2648]
With this patch I try to make DRBD to send more big packets,
and less small packets. Somethimes I experienced that enabling
Jumbo frames decreased DRBD's preformance. Altough, I could
not reproduce this at time of writing the patch, I managed to
measure the distribution of the packet's sizes. This indicates
that the patch has the intended effect:
Distribution of packet sizen when writing 512MB of zeros:
pkt old-code new-code
sizes send recv send recv
---------- ------------------------------
1 to 450: 42047 42259 38629 38784
451 to 900: 4057 4230 2023 2176
901 to 1350: 112 2919 54 2289
1351 to 1800: 6697 7439 6698 7368
1801 to 2250: 11 43 4 8
2251 to 2700: 1643 1684 1838 2307
2701 to 3150: 5 43 3 18
3151 to 3600: 2 70 0 51
3601 to 4050: 95 218 2 251
4051 to 4500: 3159 3296 1 58
4501 to 4950: 45 270 6 176
4951 to 5400: 75 717 60 453
5401 to 5850: 867 901 1376 1383
5851 to 6300: 683 696 245 251
6301 to 6750: 129 214 44 47
6751 to 7200: 24 39 266 274
7201 to 7650: 101 118 256 263
7651 to 8100: 30 1829 70 2109
8101 to 8550: 36 48 49 56
8551 to 9000+: 14971 53657 15755 54992
-Phil
Original author: phil
Date: 2006-12-28 12:07:12.682853+00:00
........
r8409 | sgraham | 2006-12-28 13:54:04 -0500 (Thu, 28 Dec 2006) | 5 lines
[DRBD-8 @ 2649]
Make DRBD compile even on SLES9 (Kernel 2.6.5)
Original author: phil
Date: 2006-12-28 15:54:28.853546+00:00
........
r8455 | sgraham | 2007-01-02 09:04:11 -0500 (Tue, 02 Jan 2007) | 5 lines
[DRBD-8 @ 2650]
Fixed a few error strings...
Original author: phil
Date: 2007-01-02 13:47:28.435731+00:00
........
------------------------------------------------------------------------
Here the state traces:
Here is the log for JErry:
drbd2: disk( Diskless -> Attaching )
drbd2: Found 6 transactions (267 active extents) in activity log.
drbd2: max_segment_size ( = BIO size ) = 32768
drbd2: drbd_bm_resize called with capacity == 30735368
drbd2: resync bitmap: bits=3841921 words=120062
drbd2: size = 14 GB (15367684 KB)
drbd2: reading of bitmap took 18 jiffies
drbd2: recounting of set bits took additional 1 jiffies
drbd2: 0 KB marked out-of-sync by on disk bit-map.
drbd2: disk( Attaching -> UpToDate )
drbd2: Writing meta data super block now.
drbd2: conn( StandAlone -> Unconnected )
drbd2: receiver (re)started
drbd2: conn( Unconnected -> WFConnection )
drbd2: conn( WFConnection -> WFReportParams )
drbd2: Handshake successful: DRBD Network Protocol version 85
drbd2: drbd_sync_handshake:
drbd2: self
1999C2A90658DA42:0000000000000000:30A5702D832E1702:0000000000000004
drbd2: peer
1999C2A90658DA43:0000000000000000:30A5702D832E1702:0000000000000004
drbd2: uuid_compare()=-1
drbd2: uuid[History_start] now 1999C2A90658DA42
drbd2: uuid[Current] now 0000000000000000
drbd2: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT )
pdsk( DUnknown -> UpToDate )
drbd2: Writing meta data super block now.
drbd2: aftr_isp( 0 -> 1 )
drbd2: conn( WFBitMapT -> WFSyncUUID )
drbd2: peer_isp( 0 -> 1 )
drbd2: uuid[Current] now FC968953A6746EEC
drbd2: uuid[Bitmap] now 0000000000000000
drbd2: conn( WFSyncUUID -> PausedSyncT ) disk( UpToDate -> Inconsistent
)
drbd2: Began resync as PausedSyncT (will sync 1052672 KB [263168 bits
set]).
drbd2: Writing meta data super block now.
drbd2: peer( Secondary -> Primary )
drbd2: peer_isp( 1 -> 0 )
drbd2: conn( PausedSyncT -> SyncTarget ) aftr_isp( 1 -> 0 )
drbd2: Syncer continues.
drbd2: Resync done (total 225 sec; paused 61 sec; 6416 K/sec)
drbd2: uuid[Bitmap] now 0000000000000000
drbd2: uuid[History_start] now 30A5702D832E1702
drbd2: uuid[History_end] now 0000000000000004
drbd2: uuid[Bitmap] now FC968953A6746EEC
drbd2: uuid[Current] now 1999C2A90658DA42
drbd2: uuid[History_start] now FC968953A6746EEC
drbd2: uuid[Bitmap] now 0000000000000000
drbd2: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
drbd2: Writing meta data super block now.
drbd2: drbd_open: Allowing open in Secondary state
drbd2: drbd_sync_handshake:
drbd2: self
1999C2A90658DA42:0000000000000000:FC968953A6746EEC:30A5702D832E1702
drbd2: peer
1999C2A90658DA42:0000000000000000:FC968953A6746EEC:30A5702D832E1702
drbd2: uuid_compare()=-1
drbd2: uuid[History_start] now 1999C2A90658DA42
drbd2: uuid[Current] now 0000000000000000
drbd2: peer( Primary -> Secondary ) conn( Connected -> WFBitMapT )
drbd2: Writing meta data super block now.
drbd2: role( Secondary -> Primary )
drbd2: Writing meta data super block now.
On ben the other node we have:
drbd2: disk( Diskless -> Attaching )
drbd2: Found 6 transactions (324 active extents) in activity log.
drbd2: max_segment_size ( = BIO size ) = 32768
drbd2: drbd_bm_resize called with capacity == 30735368
drbd2: resync bitmap: bits=3841921 words=120062
drbd2: size = 14 GB (15367684 KB)
drbd2: reading of bitmap took 26 jiffies
drbd2: recounting of set bits took additional 1 jiffies
drbd2: 0 KB marked out-of-sync by on disk bit-map.
drbd2: Marked additional 1028 MB as out-of-sync based on AL.
drbd2: disk( Attaching -> UpToDate )
drbd2: Writing meta data super block now.
drbd2: conn( StandAlone -> Unconnected )
drbd2: receiver (re)started
drbd2: conn( Unconnected -> WFConnection )
drbd2: conn( WFConnection -> WFReportParams )
drbd2: Handshake successful: DRBD Network Protocol version 85
drbd2: drbd_sync_handshake:
drbd2: self
1999C2A90658DA43:0000000000000000:30A5702D832E1702:0000000000000004
drbd2: peer
1999C2A90658DA42:0000000000000000:30A5702D832E1702:0000000000000004
drbd2: uuid_compare()=1
drbd2: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )
pdsk( DUnknown -> UpToDate )
drbd2: aftr_isp( 0 -> 1 )
drbd2: Writing meta data super block now.
drbd2: peer_isp( 0 -> 1 )
drbd2: uuid[Bitmap] now FC968953A6746EEC
drbd2: conn( WFBitMapS -> PausedSyncS ) pdsk( UpToDate -> Inconsistent )
drbd2: Began resync as PausedSyncS (will sync 1052672 KB [263168 bits
set]).
drbd2: Writing meta data super block now.
drbd2: role( Secondary -> Primary )
drbd2: Writing meta data super block now.
drbd2: aftr_isp( 1 -> 0 )
drbd2: conn( PausedSyncS -> SyncSource ) peer_isp( 1 -> 0 )
drbd2: Syncer continues.
drbd2: Resync done (total 225 sec; paused 61 sec; 6416 K/sec)
drbd2: uuid[History_start] now FC968953A6746EEC
drbd2: uuid[Bitmap] now 0000000000000000
drbd2: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
drbd2: Writing meta data super block now.
drbd2: role( Primary -> Secondary )
drbd2: Writing meta data super block now.
drbd2: drbd_sync_handshake:
drbd2: self
1999C2A90658DA43:0000000000000000:FC968953A6746EEC:30A5702D832E1702
drbd2: peer
0000000000000000:0000000000000000:1999C2A90658DA42:FC968953A6746EEC
drbd2: uuid_compare()=-2
drbd2: Writing meta data super block now.
drbd2: writing of bitmap took 59 jiffies
drbd2: 14 GB marked out-of-sync by on disk bit-map.
drbd2: 15367684 KB now marked out-of-sync by on disk bit-map.
drbd2: Writing meta data super block now.
drbd2: uuid[History_start] now 1999C2A90658DA43
drbd2: uuid[Current] now 0000000000000000
drbd2: conn( Connected -> WFBitMapT )
drbd2: Writing meta data super block now.
drbd2: peer( Secondary -> Primary )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.linbit.com/pipermail/drbd-dev/attachments/20070105/5ec55fa8/attachment-0001.htm
More information about the drbd-dev
mailing list