[DRBD-user] Data Rollback on DRBD 8.3.4

Lars Ellenberg lars.ellenberg at linbit.com
Wed Feb 23 10:51:29 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, Feb 23, 2011 at 05:37:45PM +0800, jayson.cena at teamglac.com wrote:
> sorry about that. here is the config
> 
> ###############################
> global { usage-count no; }
> common { syncer { rate 500M; } }
> 
> #DRBD FOR SERVER START
> resource r0 {
> 	protocol C;
> 	startup {
> 		wfc-timeout 15;
> 		degr-wfc-timeout 60;
> 		become-primary-on both;	
> 	}
> 	net {
> 		cram-hmac-alg sha1;
> 		shared-secret "a4tech";
> 		allow-two-primaries;
> 		after-sb-0pri discard-zero-changes;
> 		after-sb-1pri discard-secondary;

Right there.
"Dear DRBD,
 if you happen to detect data divergence on connection handshake,
 and during that handshake one side happens to be in Primary role,
 the other side happens to be in Secondary role, please just throw away
 the data of the side that is Secondary at that time.
"

Which matches exactly what the log says, below.

> 		after-sb-2pri disconnect;
> 		allow-two-primaries;
> 	}
> 	on vmcluster01 {
> 		device /dev/drbd0;
> 		disk /dev/vmdiskvg/server_root_disk;
> 		address 172.16.0.1:7788;
> 		meta-disk internal;
> 	}
> 	on vmcluster02 {
> 		device /dev/drbd0;
> 		disk /dev/vmdiskvg/server_root_disk;
> 		address 172.16.0.2:7788;
> 		meta-disk internal;
> 	}
> }
> ###############################
> 
> here are the logs from vmcluster1 r0
> ###############################
> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: Starting worker thread (from cqueue [4385])
> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: disk( Diskless -> Attaching ) 
> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: Found 4 transactions (94 active extents) in activity log.
> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: Method to ensure write ordering: barrier
> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: Backing device's merge_bvec_fn() = ffffffff814445a0
> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: max_segment_size ( = BIO size ) = 4096
> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: drbd_bm_resize called with capacity == 10485368
> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: resync bitmap: bits=1310671 words=20480
> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: size = 5120 MB (5242684 KB)
> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: recounting of set bits took additional 0 jiffies
> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: Marked additional 200 MB as out-of-sync based on AL.
> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: disk( Attaching -> UpToDate ) 
> Feb 16 08:00:11 vmcluster01 kernel: block drbd0: conn( StandAlone -> Unconnected ) 
> Feb 16 08:00:11 vmcluster01 kernel: block drbd0: Starting receiver thread (from drbd0_worker [4403])
> Feb 16 08:00:11 vmcluster01 kernel: block drbd0: receiver (re)started
> Feb 16 08:00:11 vmcluster01 kernel: block drbd0: conn( Unconnected -> WFConnection ) 
> Feb 16 08:00:26 vmcluster01 kernel: block drbd0: role( Secondary -> Primary ) 
> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: Handshake successful: Agreed network protocol version 91
> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: conn( WFConnection -> WFReportParams ) 
> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: Starting asender thread (from drbd0_receiver [4795])
> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: data-integrity-alg: <not-used>
> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: drbd_sync_handshake:
> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: self 08EC53F109A4A4CF:5DB3199CA5DE92F4:DCE4E128E07FD84A:5995FC353101D3B7 bits:51200 flags:0
> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: peer 5453F9D5A1166380:5DB3199CA5DE92F5:DCE4E128E07FD84A:5995FC353101D3B7 bits:145101 flags:2

Diverging data sets, both sides have been modified independently.

> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: uuid_compare()=100 by rule 90
> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from this node

You configured it to discard the side that is Secondary during handshake.
And that is what DRBD does.

> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) 
> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) 
> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: Began resync as SyncSource (will sync 632904 KB [158226 bits set]).
> Feb 16 08:02:17 vmcluster01 kernel: block drbd0: peer( Secondary -> Primary ) 
> Feb 16 08:04:50 vmcluster01 kernel: block drbd0: Resync done (total 87 sec; paused 0 sec; 7272 K/sec)
> Feb 16 08:04:50 vmcluster01 kernel: block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) 
> ###############################

> here are the logs for vmcluster2 (where the file servers are hosted)
> ###############################
> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: Starting worker thread (from cqueue [3624])
> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: disk( Diskless -> Attaching ) 
> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: Found 4 transactions (136 active extents) in activity log.
> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: Method to ensure write ordering: barrier
> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: Backing device's merge_bvec_fn() = ffffffffa011ccb0
> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: max_segment_size ( = BIO size ) = 4096
> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: drbd_bm_resize called with capacity == 10485368
> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: resync bitmap: bits=1310671 words=20480
> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: size = 5120 MB (5242684 KB)
> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: recounting of set bits took additional 0 jiffies
> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: 87 MB (22299 bits) marked out-of-sync by on disk bit-map.
> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: Marked additional 480 MB as out-of-sync based on AL.
> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: disk( Attaching -> UpToDate ) 
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: conn( StandAlone -> Unconnected ) 
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: Starting receiver thread (from drbd0_worker [3647])
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: receiver (re)started
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: conn( Unconnected -> WFConnection ) 
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: Handshake successful: Agreed network protocol version 91
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: conn( WFConnection -> WFReportParams ) 
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: Starting asender thread (from drbd0_receiver [4269])
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: data-integrity-alg: <not-used>
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: drbd_sync_handshake:
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: self 5453F9D5A1166380:5DB3199CA5DE92F5:DCE4E128E07FD84A:5995FC353101D3B7 bits:145101 flags:0
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: peer 08EC53F109A4A4CF:5DB3199CA5DE92F4:DCE4E128E07FD84A:5995FC353101D3B7 bits:51200 flags:2
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: uuid_compare()=100 by rule 90
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from peer node

Exactly.

> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) 
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID ) 
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) 
> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: Began resync as SyncTarget (will sync 632904 KB [158226 bits set]).
> Feb 16 08:02:13 vmcluster02 kernel: block drbd0: role( Secondary -> Primary ) 
> Feb 16 08:03:17 vmcluster02 kernel: block drbd0: Resync done (total 87 sec; paused 0 sec; 7272 K/sec)
> Feb 16 08:03:17 vmcluster02 kernel: block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) 
> Feb 16 08:03:17 vmcluster02 kernel: block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0
> Feb 16 08:03:17 vmcluster02 kernel: block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list