[DRBD-user] Data Rollback on DRBD 8.3.4

jayson.cena at teamglac.com jayson.cena at teamglac.com
Wed Feb 23 19:24:16 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


>> sorry about that. here is the config
>> 
>> ###############################
>> global { usage-count no; }
>> common { syncer { rate 500M; } }
>> 
>> #DRBD FOR SERVER START
>> resource r0 {
>> 	protocol C;
>> 	startup {
>> 		wfc-timeout 15;
>> 		degr-wfc-timeout 60;
>> 		become-primary-on both;
>> 	}
>> 	net {
>> 		cram-hmac-alg sha1;
>> 		shared-secret "a4tech";
>> 		allow-two-primaries;
>> 		after-sb-0pri discard-zero-changes;
>> 		after-sb-1pri discard-secondary;
> 
> Right there.
> "Dear DRBD,
>  if you happen to detect data divergence on connection handshake,
>  and during that handshake one side happens to be in Primary role,
>  the other side happens to be in Secondary role, please just throw away
>  the data of the side that is Secondary at that time.
> "
> 
> Which matches exactly what the log says, below.
> 
>> 		after-sb-2pri disconnect;
>> 		allow-two-primaries;
>> 	}
>> 	on vmcluster01 {
>> 		device /dev/drbd0;
>> 		disk /dev/vmdiskvg/server_root_disk;
>> 		address 172.16.0.1:7788;
>> 		meta-disk internal;
>> 	}
>> 	on vmcluster02 {
>> 		device /dev/drbd0;
>> 		disk /dev/vmdiskvg/server_root_disk;
>> 		address 172.16.0.2:7788;
>> 		meta-disk internal;
>> 	}
>> }
>> ###############################
>> 
>> here are the logs from vmcluster1 r0
>> ###############################
>> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: Starting worker thread
>> (from cqueue [4385])
>> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: disk( Diskless ->
>> Attaching )
>> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: Found 4 transactions
>> (94 active extents) in activity log.
>> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: Method to ensure write
>> ordering: barrier
>> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: Backing device's
>> merge_bvec_fn() = ffffffff814445a0
>> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: max_segment_size ( =
>> BIO size ) = 4096
>> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: drbd_bm_resize called
>> with capacity == 10485368
>> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: resync bitmap:
>> bits=1310671 words=20480
>> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: size = 5120 MB
(5242684
>> KB)
>> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: recounting of set bits
>> took additional 0 jiffies
>> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: 0 KB (0 bits) marked
>> out-of-sync by on disk bit-map.
>> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: Marked additional 200
>> MB as out-of-sync based on AL.
>> Feb 16 08:00:09 vmcluster01 kernel: block drbd0: disk( Attaching ->
>> UpToDate )
>> Feb 16 08:00:11 vmcluster01 kernel: block drbd0: conn( StandAlone ->
>> Unconnected )
>> Feb 16 08:00:11 vmcluster01 kernel: block drbd0: Starting receiver
>> thread (from drbd0_worker [4403])
>> Feb 16 08:00:11 vmcluster01 kernel: block drbd0: receiver (re)started
>> Feb 16 08:00:11 vmcluster01 kernel: block drbd0: conn( Unconnected ->
>> WFConnection )
>> Feb 16 08:00:26 vmcluster01 kernel: block drbd0: role( Secondary ->
>> Primary )
>> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: Handshake successful:
>> Agreed network protocol version 91
>> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: Peer authenticated
>> using 20 bytes of 'sha1' HMAC
>> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: conn( WFConnection ->
>> WFReportParams )
>> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: Starting asender
thread
>> (from drbd0_receiver [4795])
>> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: data-integrity-alg:
>> <not-used>
>> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: drbd_sync_handshake:
>> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: self
>> 08EC53F109A4A4CF:5DB3199CA5DE92F4:DCE4E128E07FD84A:5995FC353101D3B7
>> bits:51200 flags:0
>> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: peer
>> 5453F9D5A1166380:5DB3199CA5DE92F5:DCE4E128E07FD84A:5995FC353101D3B7
>> bits:145101 flags:2
> 
> Diverging data sets, both sides have been modified independently.

but only vmcluster02 is writing to r0 because the startup configuration of
the VM that is using r0 will only reside on 1 host. is there a possibility
that when the hosts writing to the resource crashed, a split brain will
occur?

> 
>> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: uuid_compare()=100 by
>> rule 90
>> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: Split-Brain detected,
1
>> primaries, automatically solved. Sync from this node
> 
> You configured it to discard the side that is Secondary during
handshake.
> And that is what DRBD does.
> 
>> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: peer( Unknown ->
>> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown ->
>> UpToDate )
>> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: conn( WFBitMapS ->
>> SyncSource ) pdsk( UpToDate -> Inconsistent )
>> Feb 16 08:01:53 vmcluster01 kernel: block drbd0: Began resync as
>> SyncSource (will sync 632904 KB [158226 bits set]).
>> Feb 16 08:02:17 vmcluster01 kernel: block drbd0: peer( Secondary ->
>> Primary )
>> Feb 16 08:04:50 vmcluster01 kernel: block drbd0: Resync done (total 87
>> sec; paused 0 sec; 7272 K/sec)
>> Feb 16 08:04:50 vmcluster01 kernel: block drbd0: conn( SyncSource ->
>> Connected ) pdsk( Inconsistent -> UpToDate )
>> ###############################
> 
>> here are the logs for vmcluster2 (where the file servers are hosted)
>> ###############################
>> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: Starting worker thread
>> (from cqueue [3624])
>> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: disk( Diskless ->
>> Attaching )
>> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: Found 4 transactions
>> (136 active extents) in activity log.
>> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: Method to ensure write
>> ordering: barrier
>> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: Backing device's
>> merge_bvec_fn() = ffffffffa011ccb0
>> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: max_segment_size ( =
>> BIO size ) = 4096
>> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: drbd_bm_resize called
>> with capacity == 10485368
>> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: resync bitmap:
>> bits=1310671 words=20480
>> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: size = 5120 MB
(5242684
>> KB)
>> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: recounting of set bits
>> took additional 0 jiffies
>> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: 87 MB (22299 bits)
>> marked out-of-sync by on disk bit-map.
>> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: Marked additional 480
>> MB as out-of-sync based on AL.
>> Feb 16 08:01:49 vmcluster02 kernel: block drbd0: disk( Attaching ->
>> UpToDate )
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: conn( StandAlone ->
>> Unconnected )
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: Starting receiver
>> thread (from drbd0_worker [3647])
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: receiver (re)started
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: conn( Unconnected ->
>> WFConnection )
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: Handshake successful:
>> Agreed network protocol version 91
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: Peer authenticated
>> using 20 bytes of 'sha1' HMAC
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: conn( WFConnection ->
>> WFReportParams )
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: Starting asender
thread
>> (from drbd0_receiver [4269])
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: data-integrity-alg:
>> <not-used>
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: drbd_sync_handshake:
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: self
>> 5453F9D5A1166380:5DB3199CA5DE92F5:DCE4E128E07FD84A:5995FC353101D3B7
>> bits:145101 flags:0
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: peer
>> 08EC53F109A4A4CF:5DB3199CA5DE92F4:DCE4E128E07FD84A:5995FC353101D3B7
>> bits:51200 flags:2
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: uuid_compare()=100 by
>> rule 90
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: Split-Brain detected,
1
>> primaries, automatically solved. Sync from peer node
> 
> Exactly.
> 
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: peer( Unknown ->
>> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown ->
UpToDate
>> )
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: conn( WFBitMapT ->
>> WFSyncUUID )
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: helper command:
>> /sbin/drbdadm before-resync-target minor-0
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: helper command:
>> /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: conn( WFSyncUUID ->
>> SyncTarget ) disk( UpToDate -> Inconsistent )
>> Feb 16 08:01:50 vmcluster02 kernel: block drbd0: Began resync as
>> SyncTarget (will sync 632904 KB [158226 bits set]).
>> Feb 16 08:02:13 vmcluster02 kernel: block drbd0: role( Secondary ->
>> Primary )
>> Feb 16 08:03:17 vmcluster02 kernel: block drbd0: Resync done (total 87
>> sec; paused 0 sec; 7272 K/sec)
>> Feb 16 08:03:17 vmcluster02 kernel: block drbd0: conn( SyncTarget ->
>> Connected ) disk( Inconsistent -> UpToDate )
>> Feb 16 08:03:17 vmcluster02 kernel: block drbd0: helper command:
>> /sbin/drbdadm after-resync-target minor-0
>> Feb 16 08:03:17 vmcluster02 kernel: block drbd0: helper command:
>> /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)




More information about the drbd-user mailing list