[DRBD-user] How the fencing works

marc genou marcgenou at gmail.com
Tue Jan 18 12:58:58 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi guys,

I would like to know how the fencing works to be sure if I was not
understanding or if it is not working as expected.
I explain my config

The HN are xx4 and xx10 and I am using heartbeat and pacemaker.
The resource is VZDataClone3 (drbd3)
My drbd.conf has the following lines:

### Only when DRBD is under cluster ###
fencing resource-only;
### --- ###

and

### Only when DRBD is under cluster ###
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
}
### --- ###

At any given moment, xx10 (a halt, reboot, massive crash, etc) is marked as
dead.

xx4 var log:

xx4 cibadmin: [31254]: info: Invoked: cibadmin -C -o constraints -X
<rsc_location rsc="VZDataClone3" id="drbd-fence-by-handler-VZDataClone3">
<rule role="Master" score="-INFINITY"
id="drbd-fence-by-handler-rule-VZDataClone3">     <expression
attribute="#uname" operation="ne" value="xx4"
id="drbd-fence-by-handler-expr-VZDataClone3"/>   </rule> </rsc_location>

should I suppose that this constraints avoid to run the resource in xx4? coz
the following lines are xx4 taking over the resource (the behaviour I
expected, Ok)

5 mins later xx10 (which is the prefer location for that resource) come back
to life and xx4 seems to do stop and unmount succesfully. I expected to
become secondary again but it didn't.

xx4 var log mesages:

Jan 16 11:17:45 xx4 kernel: block drbd3: conn( WFConnection -> Disconnecting
)
Jan 16 11:17:45 xx4 kernel: block drbd3: Discarding network configuration.
Jan 16 11:17:45 xx4 kernel: block drbd3: Connection closed
Jan 16 11:17:45 xx4 kernel: block drbd3: conn( Disconnecting -> StandAlone )
Jan 16 11:17:45 xx4 kernel: block drbd3: receiver terminated
Jan 16 11:17:45 xx4 kernel: block drbd3: Terminating receiver thread
Jan 16 11:17:45 xx4 kernel: block drbd3: disk( UpToDate -> Diskless ) pdsk(
Outdated -> DUnknown )
Jan 16 11:17:45 xx4 kernel: block drbd3: drbd_bm_resize called with capacity
== 0
Jan 16 11:17:45 xx4 kernel: block drbd3: worker terminated
Jan 16 11:17:45 xx4 kernel: block drbd3: Terminating worker thread
Jan 16 11:17:45 xx4 lrmd: [18448]: info: RA output: (VZData3:1:stop:stdout)
Jan 16 11:17:45 xx4 crm_attribute: [6762]: info: Invoked: crm_attribute -N
xx4 -n master-VZData3:1 -l reboot -D
Jan 16 11:17:45 xx4 attrd: [18450]: info: attrd_trigger_update: Sending
flush op to all hosts for: master-VZData3:1 (<null>)
Jan 16 11:17:45 xx4 lrmd: [18448]: info: RA output: (VZData3:1:stop:stdout)
Jan 16 11:17:45 xx4 attrd: [18450]: info: attrd_perform_update: Sent delete
454: node=e96b792b-46d9-479c-8295-d109b35bb184, attr=master-VZData3:1,
id=<n/a>, set=(null), section=status
Jan 16 11:17:45 xx4 crmd: [18451]: info: process_lrm_event: LRM operation
VZData3:1_stop_0 (call=80, rc=0, cib-update=91, confirmed=true) ok
Jan 16 11:17:45 xx4 attrd: [18450]: info: attrd_ha_callback: flush message
from xx4
Jan 16 11:17:45 xx4 attrd: [18450]: info: attrd_perform_update: Sent delete
456: node=e96b792b-46d9-479c-8295-d109b35bb184, attr=master-VZData3:1,
id=<n/a>, set=(null), section=status
Jan 16 11:17:56 xx4 lrmd: [18448]: info: RA output:
(VZData6:1:monitor:stderr) DRBD module version: 8.3.4    userland version:
8.3.8 preferably kernel and userland versions should match.
Jan 16 11:17:59 xx4 attrd: [18450]: info: attrd_ha_callback: flush message
from xx10

now you should imagine that the resource is safely running in xx10 (even
without a secondary) Am i wrong?

xx10 never got the resource as primary again and is unable to mount it

xx10 var/log/messages;

Jan 16 11:17:45 localhost attrd: [14477]: info: attrd_ha_callback: flush
message from xx4
Jan 16 11:17:48 localhost crmd: [14478]: info: do_lrm_rsc_op: Performing
key=139:2349:0:c25aba18-b3d1-4560-a131-aa35fdf24f6d op=VZData3:1_start_0 )
Jan 16 11:17:48 localhost lrmd: [14473]: info: rsc:VZData3:1:29: start
Jan 16 11:17:48 localhost kernel: drbd: initialized. Version: 8.3.4
(api:88/proto:86-91)
Jan 16 11:17:48 localhost kernel: drbd: GIT-hash:
70a645ae080411c87b4482a135847d69dc90a6a2 build by xemul at ovzcore.sw.ru,
2009-10-12 19:29:01
Jan 16 11:17:48 localhost kernel: drbd: registered as block device major 147
Jan 16 11:17:48 localhost kernel: drbd: minor_table @ 0xffff81041249ecc0
Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output:
(VZData3:1:start:stdout)
Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output:
(VZData3:1:start:stderr) DRBD module version: 8.3.4    userland version:
8.3.8 preferably kernel and userland versions should match.
Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output:
(VZData3:1:start:stderr) open(/dev/mapper/VG_xx10_vz-vzpart3) failed: No
such file or directory
Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output:
(VZData3:1:start:stderr) Command 'drbdmeta 3 v08
/dev/mapper/VG_xx10_vz-vzpart3 internal check-resize
Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output:
(VZData3:1:start:stderr) ' terminated with exit code 20 drbdadm check-resize
vzpart3: exited with code 20
Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Called drbdadm -c
/etc/drbd.conf check-resize vzpart3
Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Exit code 20
Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Command output:
Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output:
(VZData3:1:start:stdout)
Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output:
(VZData3:1:start:stderr) DRBD module version: 8.3.4    userland version:
8.3.8 preferably kernel and userland versions should match.
Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output:
(VZData3:1:start:stderr) Can not open device
'/dev/mapper/VG_xx10_vz-vzpart3': No such file or directory
Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Called drbdadm -c
/etc/drbd.conf --peer xx3 up vzpart3
Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output:
(VZData3:1:start:stderr) Command 'drbdsetup 3 disk
/dev/mapper/VG_xx10_vz-vzpart3 /dev/mapper/VG_xx10_vz-vzpart3 internal
--set-defaults --create-device --fencing=resource-only --on-io-error=detach'
terminated with exit code 20 drbdadm attach vzpart3: exited with code 20
Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Exit code 1
Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Command output:
Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output:
(VZData3:1:start:stdout)
Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output:
(VZData3:1:start:stderr) DRBD module version: 8.3.4    userland version:
8.3.8 preferably kernel and userland versions should match. Can not open
device '/dev/mapper/VG_xx10_vz-vzpart3': No such file or directory Command
'drbdsetup 3 disk /dev/mapper/VG_xx10_vz-vzpart3
/dev/mapper/VG_xx10_vz-vzpart3 internal --set-defaults --create-device
--fencing=resource-only --on-io-error=detach' terminated with exit code 20
drbdadm attach vzpart3: exited with code 20
Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Called drbdadm -c
/etc/drbd.conf --peer xx3 up vzpart3
Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Exit code 1

after some tries xx10 show this: should this happened before?

xx10 var log messages:

Jan 16 11:17:58 localhost kernel: block drbd3: Starting worker thread (from
cqueue/11 [387])
Jan 16 11:17:58 localhost kernel: block drbd3: disk( Diskless -> Attaching )
Jan 16 11:17:58 localhost kernel: block drbd3: Found 4 transactions (136
active extents) in activity log.
Jan 16 11:17:58 localhost kernel: block drbd3: Method to ensure write
ordering: barrier
Jan 16 11:17:58 localhost kernel: block drbd3: max_segment_size ( = BIO size
) = 32768
Jan 16 11:17:58 localhost kernel: block drbd3: drbd_bm_resize called with
capacity == 209708728
Jan 16 11:17:58 localhost kernel: block drbd3: resync bitmap: bits=26213591
words=409588
Jan 16 11:17:58 localhost kernel: block drbd3: size = 100 GB (104854364 KB)
Jan 16 11:17:58 localhost kernel: block drbd3: recounting of set bits took
additional 4 jiffies
Jan 16 11:17:58 localhost kernel: block drbd3: 0 KB (0 bits) marked
out-of-sync by on disk bit-map.
Jan 16 11:17:58 localhost kernel: block drbd3: Marked additional 508 MB as
out-of-sync based on AL.
Jan 16 11:17:58 localhost kernel: block drbd3: disk( Attaching -> Consistent
)
Jan 16 11:17:58 localhost kernel: block drbd3: Barriers not supported on
meta data device - disabling
Jan 16 11:17:58 localhost kernel: block drbd3: conn( StandAlone ->
Unconnected )
Jan 16 11:17:58 localhost kernel: block drbd3: Starting receiver thread
(from drbd3_worker [16650])
Jan 16 11:17:58 localhost lrmd: [14473]: info: RA output:
(VZData3:1:start:stdout)
Jan 16 11:17:58 localhost kernel: block drbd3: receiver (re)started
Jan 16 11:17:58 localhost kernel: block drbd3: conn( Unconnected ->
WFConnection )


I am testing and learning to build things with all this great software and I
would be very pleased to understand these terms
Thank you guys
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110118/19897774/attachment.htm>


More information about the drbd-user mailing list