Hi guys, <div><br></div><div>I would like to know how the fencing works to be sure if I was not understanding or if it is not working as expected.</div><div>I explain my config</div><div><br></div><div>The HN are xx4 and xx10 and I am using heartbeat and pacemaker.</div>
<div>The resource is VZDataClone3 (drbd3)</div><div>My drbd.conf has the following lines:</div><div><br></div><div>### Only when DRBD is under cluster ###</div><div>fencing resource-only;</div><div>### --- ###</div><div><br>
</div><div>and </div><div><br></div><div><div>### Only when DRBD is under cluster ###</div><div>handlers {</div><div>split-brain "/usr/lib/drbd/notify-split-brain.sh root";</div><div>fence-peer "/usr/lib/drbd/crm-fence-peer.sh";</div>
<div>after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";</div><div>}</div><div>### --- ###</div></div><div><br></div><div>At any given moment, xx10 (a halt, reboot, massive crash, etc) is marked as dead.</div>
<div><br></div><div>xx4 var log:</div><div><br></div><div>xx4 cibadmin: [31254]: info: Invoked: cibadmin -C -o constraints -X <rsc_location rsc="VZDataClone3" id="drbd-fence-by-handler-VZDataClone3"> <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-rule-VZDataClone3"> <expression attribute="#uname" operation="ne" value="xx4" id="drbd-fence-by-handler-expr-VZDataClone3"/> </rule> </rsc_location></div>
<div><br></div><div>should I suppose that this constraints avoid to run the resource in xx4? coz the following lines are xx4 taking over the resource (the behaviour I expected, Ok)</div><div><br></div><div>5 mins later xx10 (which is the prefer location for that resource) come back to life and xx4 seems to do stop and unmount succesfully. I expected to become secondary again but it didn't.</div>
<div><br></div><div>xx4 var log mesages:</div><div><br></div><div><div>Jan 16 11:17:45 xx4 kernel: block drbd3: conn( WFConnection -> Disconnecting )</div><div>Jan 16 11:17:45 xx4 kernel: block drbd3: Discarding network configuration.</div>
<div>Jan 16 11:17:45 xx4 kernel: block drbd3: Connection closed</div><div>Jan 16 11:17:45 xx4 kernel: block drbd3: conn( Disconnecting -> StandAlone )</div><div>Jan 16 11:17:45 xx4 kernel: block drbd3: receiver terminated</div>
<div>Jan 16 11:17:45 xx4 kernel: block drbd3: Terminating receiver thread</div><div>Jan 16 11:17:45 xx4 kernel: block drbd3: disk( UpToDate -> Diskless ) pdsk( Outdated -> DUnknown )</div><div>Jan 16 11:17:45 xx4 kernel: block drbd3: drbd_bm_resize called with capacity == 0</div>
<div>Jan 16 11:17:45 xx4 kernel: block drbd3: worker terminated</div><div>Jan 16 11:17:45 xx4 kernel: block drbd3: Terminating worker thread</div><div>Jan 16 11:17:45 xx4 lrmd: [18448]: info: RA output: (VZData3:1:stop:stdout)</div>
<div>Jan 16 11:17:45 xx4 crm_attribute: [6762]: info: Invoked: crm_attribute -N xx4 -n master-VZData3:1 -l reboot -D</div><div>Jan 16 11:17:45 xx4 attrd: [18450]: info: attrd_trigger_update: Sending flush op to all hosts for: master-VZData3:1 (<null>)</div>
<div>Jan 16 11:17:45 xx4 lrmd: [18448]: info: RA output: (VZData3:1:stop:stdout)</div><div>Jan 16 11:17:45 xx4 attrd: [18450]: info: attrd_perform_update: Sent delete 454: node=e96b792b-46d9-479c-8295-d109b35bb184, attr=master-VZData3:1, id=<n/a>, set=(null), section=status</div>
<div>Jan 16 11:17:45 xx4 crmd: [18451]: info: process_lrm_event: LRM operation VZData3:1_stop_0 (call=80, rc=0, cib-update=91, confirmed=true) ok</div><div>Jan 16 11:17:45 xx4 attrd: [18450]: info: attrd_ha_callback: flush message from xx4</div>
<div>Jan 16 11:17:45 xx4 attrd: [18450]: info: attrd_perform_update: Sent delete 456: node=e96b792b-46d9-479c-8295-d109b35bb184, attr=master-VZData3:1, id=<n/a>, set=(null), section=status</div><div>Jan 16 11:17:56 xx4 lrmd: [18448]: info: RA output: (VZData6:1:monitor:stderr) DRBD module version: 8.3.4 userland version: 8.3.8 preferably kernel and userland versions should match.</div>
<div>Jan 16 11:17:59 xx4 attrd: [18450]: info: attrd_ha_callback: flush message from xx10</div></div><div><br></div><div>now you should imagine that the resource is safely running in xx10 (even without a secondary) Am i wrong?</div>
<div><br></div><div>xx10 never got the resource as primary again and is unable to mount it</div><div><br></div><div>xx10 var/log/messages;</div><div><br></div><div><div>Jan 16 11:17:45 localhost attrd: [14477]: info: attrd_ha_callback: flush message from xx4</div>
<div>Jan 16 11:17:48 localhost crmd: [14478]: info: do_lrm_rsc_op: Performing key=139:2349:0:c25aba18-b3d1-4560-a131-aa35fdf24f6d op=VZData3:1_start_0 )</div><div>Jan 16 11:17:48 localhost lrmd: [14473]: info: rsc:VZData3:1:29: start</div>
<div>Jan 16 11:17:48 localhost kernel: drbd: initialized. Version: 8.3.4 (api:88/proto:86-91)</div><div>Jan 16 11:17:48 localhost kernel: drbd: GIT-hash: 70a645ae080411c87b4482a135847d69dc90a6a2 build by <a href="mailto:xemul@ovzcore.sw.ru">xemul@ovzcore.sw.ru</a>, 2009-10-12 19:29:01</div>
<div>Jan 16 11:17:48 localhost kernel: drbd: registered as block device major 147</div><div>Jan 16 11:17:48 localhost kernel: drbd: minor_table @ 0xffff81041249ecc0</div><div>Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stdout)</div>
<div>Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stderr) DRBD module version: 8.3.4 userland version: 8.3.8 preferably kernel and userland versions should match.</div><div>Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stderr) open(/dev/mapper/VG_xx10_vz-vzpart3) failed: No such file or directory</div>
<div>Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stderr) Command 'drbdmeta 3 v08 /dev/mapper/VG_xx10_vz-vzpart3 internal check-resize</div><div>Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stderr) ' terminated with exit code 20 drbdadm check-resize vzpart3: exited with code 20</div>
<div>Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Called drbdadm -c /etc/drbd.conf check-resize vzpart3</div><div>Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Exit code 20</div><div>Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Command output:</div>
<div>Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stdout)</div><div>Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stderr) DRBD module version: 8.3.4 userland version: 8.3.8 preferably kernel and userland versions should match.</div>
<div>Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stderr) Can not open device '/dev/mapper/VG_xx10_vz-vzpart3': No such file or directory</div><div>Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Called drbdadm -c /etc/drbd.conf --peer xx3 up vzpart3</div>
<div>Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stderr) Command 'drbdsetup 3 disk /dev/mapper/VG_xx10_vz-vzpart3 /dev/mapper/VG_xx10_vz-vzpart3 internal --set-defaults --create-device --fencing=resource-only --on-io-error=detach' terminated with exit code 20 drbdadm attach vzpart3: exited with code 20</div>
<div>Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Exit code 1</div><div>Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Command output:</div><div>Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stdout)</div>
<div>Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stderr) DRBD module version: 8.3.4 userland version: 8.3.8 preferably kernel and userland versions should match. Can not open device '/dev/mapper/VG_xx10_vz-vzpart3': No such file or directory Command 'drbdsetup 3 disk /dev/mapper/VG_xx10_vz-vzpart3 /dev/mapper/VG_xx10_vz-vzpart3 internal --set-defaults --create-device --fencing=resource-only --on-io-error=detach' terminated with exit code 20 drbdadm attach vzpart3: exited with code 20</div>
<div>Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Called drbdadm -c /etc/drbd.conf --peer xx3 up vzpart3</div><div>Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Exit code 1</div></div><div> </div><div>after some tries xx10 show this: should this happened before?</div>
<div><br></div><div>xx10 var log messages:</div><div><br></div><div><div>Jan 16 11:17:58 localhost kernel: block drbd3: Starting worker thread (from cqueue/11 [387])</div><div>Jan 16 11:17:58 localhost kernel: block drbd3: disk( Diskless -> Attaching )</div>
<div>Jan 16 11:17:58 localhost kernel: block drbd3: Found 4 transactions (136 active extents) in activity log.</div><div>Jan 16 11:17:58 localhost kernel: block drbd3: Method to ensure write ordering: barrier</div><div>Jan 16 11:17:58 localhost kernel: block drbd3: max_segment_size ( = BIO size ) = 32768</div>
<div>Jan 16 11:17:58 localhost kernel: block drbd3: drbd_bm_resize called with capacity == 209708728</div><div>Jan 16 11:17:58 localhost kernel: block drbd3: resync bitmap: bits=26213591 words=409588</div><div>Jan 16 11:17:58 localhost kernel: block drbd3: size = 100 GB (104854364 KB)</div>
<div>Jan 16 11:17:58 localhost kernel: block drbd3: recounting of set bits took additional 4 jiffies</div><div>Jan 16 11:17:58 localhost kernel: block drbd3: 0 KB (0 bits) marked out-of-sync by on disk bit-map.</div><div>
Jan 16 11:17:58 localhost kernel: block drbd3: Marked additional 508 MB as out-of-sync based on AL.</div><div>Jan 16 11:17:58 localhost kernel: block drbd3: disk( Attaching -> Consistent )</div><div>Jan 16 11:17:58 localhost kernel: block drbd3: Barriers not supported on meta data device - disabling</div>
<div>Jan 16 11:17:58 localhost kernel: block drbd3: conn( StandAlone -> Unconnected )</div><div>Jan 16 11:17:58 localhost kernel: block drbd3: Starting receiver thread (from drbd3_worker [16650])</div><div>Jan 16 11:17:58 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stdout)</div>
<div>Jan 16 11:17:58 localhost kernel: block drbd3: receiver (re)started</div><div>Jan 16 11:17:58 localhost kernel: block drbd3: conn( Unconnected -> WFConnection )</div></div><div><br></div><div><br></div><div>I am testing and learning to build things with all this great software and I would be very pleased to understand these terms</div>
<div>Thank you guys</div><div><br></div><div><br></div>