Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi guys, I would like to know how the fencing works to be sure if I was not understanding or if it is not working as expected. I explain my config The HN are xx4 and xx10 and I am using heartbeat and pacemaker. The resource is VZDataClone3 (drbd3) My drbd.conf has the following lines: ### Only when DRBD is under cluster ### fencing resource-only; ### --- ### and ### Only when DRBD is under cluster ### handlers { split-brain "/usr/lib/drbd/notify-split-brain.sh root"; fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; } ### --- ### At any given moment, xx10 (a halt, reboot, massive crash, etc) is marked as dead. xx4 var log: xx4 cibadmin: [31254]: info: Invoked: cibadmin -C -o constraints -X <rsc_location rsc="VZDataClone3" id="drbd-fence-by-handler-VZDataClone3"> <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-rule-VZDataClone3"> <expression attribute="#uname" operation="ne" value="xx4" id="drbd-fence-by-handler-expr-VZDataClone3"/> </rule> </rsc_location> should I suppose that this constraints avoid to run the resource in xx4? coz the following lines are xx4 taking over the resource (the behaviour I expected, Ok) 5 mins later xx10 (which is the prefer location for that resource) come back to life and xx4 seems to do stop and unmount succesfully. I expected to become secondary again but it didn't. xx4 var log mesages: Jan 16 11:17:45 xx4 kernel: block drbd3: conn( WFConnection -> Disconnecting ) Jan 16 11:17:45 xx4 kernel: block drbd3: Discarding network configuration. Jan 16 11:17:45 xx4 kernel: block drbd3: Connection closed Jan 16 11:17:45 xx4 kernel: block drbd3: conn( Disconnecting -> StandAlone ) Jan 16 11:17:45 xx4 kernel: block drbd3: receiver terminated Jan 16 11:17:45 xx4 kernel: block drbd3: Terminating receiver thread Jan 16 11:17:45 xx4 kernel: block drbd3: disk( UpToDate -> Diskless ) pdsk( Outdated -> DUnknown ) Jan 16 11:17:45 xx4 kernel: block drbd3: drbd_bm_resize called with capacity == 0 Jan 16 11:17:45 xx4 kernel: block drbd3: worker terminated Jan 16 11:17:45 xx4 kernel: block drbd3: Terminating worker thread Jan 16 11:17:45 xx4 lrmd: [18448]: info: RA output: (VZData3:1:stop:stdout) Jan 16 11:17:45 xx4 crm_attribute: [6762]: info: Invoked: crm_attribute -N xx4 -n master-VZData3:1 -l reboot -D Jan 16 11:17:45 xx4 attrd: [18450]: info: attrd_trigger_update: Sending flush op to all hosts for: master-VZData3:1 (<null>) Jan 16 11:17:45 xx4 lrmd: [18448]: info: RA output: (VZData3:1:stop:stdout) Jan 16 11:17:45 xx4 attrd: [18450]: info: attrd_perform_update: Sent delete 454: node=e96b792b-46d9-479c-8295-d109b35bb184, attr=master-VZData3:1, id=<n/a>, set=(null), section=status Jan 16 11:17:45 xx4 crmd: [18451]: info: process_lrm_event: LRM operation VZData3:1_stop_0 (call=80, rc=0, cib-update=91, confirmed=true) ok Jan 16 11:17:45 xx4 attrd: [18450]: info: attrd_ha_callback: flush message from xx4 Jan 16 11:17:45 xx4 attrd: [18450]: info: attrd_perform_update: Sent delete 456: node=e96b792b-46d9-479c-8295-d109b35bb184, attr=master-VZData3:1, id=<n/a>, set=(null), section=status Jan 16 11:17:56 xx4 lrmd: [18448]: info: RA output: (VZData6:1:monitor:stderr) DRBD module version: 8.3.4 userland version: 8.3.8 preferably kernel and userland versions should match. Jan 16 11:17:59 xx4 attrd: [18450]: info: attrd_ha_callback: flush message from xx10 now you should imagine that the resource is safely running in xx10 (even without a secondary) Am i wrong? xx10 never got the resource as primary again and is unable to mount it xx10 var/log/messages; Jan 16 11:17:45 localhost attrd: [14477]: info: attrd_ha_callback: flush message from xx4 Jan 16 11:17:48 localhost crmd: [14478]: info: do_lrm_rsc_op: Performing key=139:2349:0:c25aba18-b3d1-4560-a131-aa35fdf24f6d op=VZData3:1_start_0 ) Jan 16 11:17:48 localhost lrmd: [14473]: info: rsc:VZData3:1:29: start Jan 16 11:17:48 localhost kernel: drbd: initialized. Version: 8.3.4 (api:88/proto:86-91) Jan 16 11:17:48 localhost kernel: drbd: GIT-hash: 70a645ae080411c87b4482a135847d69dc90a6a2 build by xemul at ovzcore.sw.ru, 2009-10-12 19:29:01 Jan 16 11:17:48 localhost kernel: drbd: registered as block device major 147 Jan 16 11:17:48 localhost kernel: drbd: minor_table @ 0xffff81041249ecc0 Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stdout) Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stderr) DRBD module version: 8.3.4 userland version: 8.3.8 preferably kernel and userland versions should match. Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stderr) open(/dev/mapper/VG_xx10_vz-vzpart3) failed: No such file or directory Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stderr) Command 'drbdmeta 3 v08 /dev/mapper/VG_xx10_vz-vzpart3 internal check-resize Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stderr) ' terminated with exit code 20 drbdadm check-resize vzpart3: exited with code 20 Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Called drbdadm -c /etc/drbd.conf check-resize vzpart3 Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Exit code 20 Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Command output: Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stdout) Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stderr) DRBD module version: 8.3.4 userland version: 8.3.8 preferably kernel and userland versions should match. Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stderr) Can not open device '/dev/mapper/VG_xx10_vz-vzpart3': No such file or directory Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Called drbdadm -c /etc/drbd.conf --peer xx3 up vzpart3 Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stderr) Command 'drbdsetup 3 disk /dev/mapper/VG_xx10_vz-vzpart3 /dev/mapper/VG_xx10_vz-vzpart3 internal --set-defaults --create-device --fencing=resource-only --on-io-error=detach' terminated with exit code 20 drbdadm attach vzpart3: exited with code 20 Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Exit code 1 Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Command output: Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stdout) Jan 16 11:17:48 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stderr) DRBD module version: 8.3.4 userland version: 8.3.8 preferably kernel and userland versions should match. Can not open device '/dev/mapper/VG_xx10_vz-vzpart3': No such file or directory Command 'drbdsetup 3 disk /dev/mapper/VG_xx10_vz-vzpart3 /dev/mapper/VG_xx10_vz-vzpart3 internal --set-defaults --create-device --fencing=resource-only --on-io-error=detach' terminated with exit code 20 drbdadm attach vzpart3: exited with code 20 Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Called drbdadm -c /etc/drbd.conf --peer xx3 up vzpart3 Jan 16 11:17:48 localhost drbd[15292]: ERROR: vzpart3: Exit code 1 after some tries xx10 show this: should this happened before? xx10 var log messages: Jan 16 11:17:58 localhost kernel: block drbd3: Starting worker thread (from cqueue/11 [387]) Jan 16 11:17:58 localhost kernel: block drbd3: disk( Diskless -> Attaching ) Jan 16 11:17:58 localhost kernel: block drbd3: Found 4 transactions (136 active extents) in activity log. Jan 16 11:17:58 localhost kernel: block drbd3: Method to ensure write ordering: barrier Jan 16 11:17:58 localhost kernel: block drbd3: max_segment_size ( = BIO size ) = 32768 Jan 16 11:17:58 localhost kernel: block drbd3: drbd_bm_resize called with capacity == 209708728 Jan 16 11:17:58 localhost kernel: block drbd3: resync bitmap: bits=26213591 words=409588 Jan 16 11:17:58 localhost kernel: block drbd3: size = 100 GB (104854364 KB) Jan 16 11:17:58 localhost kernel: block drbd3: recounting of set bits took additional 4 jiffies Jan 16 11:17:58 localhost kernel: block drbd3: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jan 16 11:17:58 localhost kernel: block drbd3: Marked additional 508 MB as out-of-sync based on AL. Jan 16 11:17:58 localhost kernel: block drbd3: disk( Attaching -> Consistent ) Jan 16 11:17:58 localhost kernel: block drbd3: Barriers not supported on meta data device - disabling Jan 16 11:17:58 localhost kernel: block drbd3: conn( StandAlone -> Unconnected ) Jan 16 11:17:58 localhost kernel: block drbd3: Starting receiver thread (from drbd3_worker [16650]) Jan 16 11:17:58 localhost lrmd: [14473]: info: RA output: (VZData3:1:start:stdout) Jan 16 11:17:58 localhost kernel: block drbd3: receiver (re)started Jan 16 11:17:58 localhost kernel: block drbd3: conn( Unconnected -> WFConnection ) I am testing and learning to build things with all this great software and I would be very pleased to understand these terms Thank you guys -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110118/19897774/attachment.htm>