Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Oct 8, 2012, at 9:19 AM, Velayutham, Prakash wrote: > On Oct 8, 2012, at 4:55 AM, Lars Ellenberg wrote: > >> On Sat, Oct 06, 2012 at 01:08:43PM +0000, Velayutham, Prakash wrote: >>> Hi, >>> >>> I recently got a DRBD (8.4.2-2) cluster up (still testing). It seems to work nicely with Pacemaker CRM in several scenarios I have tested. Here is my config. >>> >>> global { >>> usage-count yes; >>> } >>> >>> common { >>> handlers { >>> outdate-peer /usr/lib/drbd/crm-fence-peer.sh; >>> fence-peer /usr/lib/drbd/crm-fence-peer.sh; >>> after-resync-target /usr/lib/drbd/crm-unfence-peer.sh; >>> local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; >>> split-brain "/usr/lib/drbd/notify-split-brain.sh root"; >>> } >>> >>> startup { >>> degr-wfc-timeout 0; >>> } >>> >>> net { >>> shared-secret 1QP69G4kWDslx2TMiaEStI6bwaGH5y8d; >>> after-sb-0pri discard-zero-changes; >>> after-sb-1pri discard-secondary; >>> after-sb-2pri disconnect; >>> } >>> >>> disk { >>> on-io-error call-local-io-error; >>> fencing resource-and-stonith; >>> } >>> >>> } >>> >>> The io-error handler only gets called when the primary node has a disk >>> issue. I have not seen the secondary node call the "local-io-error" >>> handler when it had disk access issues. Is this by design? >> >> No. >> >> "Works for me", though. >> >> Can you please double check? >> And if in fact you can reproduce, tell us how, including logs? >> >> >> Thanks, >> >> -- >> : Lars Ellenberg > > Hi Lars, > > If I disable all the FC ports in the fiber switch just for the primary node, the node fences, reboots and comes up, as I would expect. With the exact same config, if I disable the FC ports just for the secondary node, the node just sits there and it even shows up as Secondary in /proc/drbd. That sounds odd and sounds like the config should be "diskless", but it is "call-local-io-error". > > Here is the full config. > > /etc/drbd.conf > > ## generated by drbd-gui > > include "drbd.d/global_common.conf"; > include "drbd.d/*.res"; > > /etc/drbd.d/global_common.conf: > > ## generated by drbd-gui > > global { > usage-count yes; > } > > common { > handlers { > fence-peer /usr/lib/drbd/crm-fence-peer.sh; > after-resync-target /usr/lib/drbd/crm-unfence-peer.sh; > local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; > split-brain "/usr/lib/drbd/notify-split-brain.sh root"; > } > > startup { > degr-wfc-timeout 0; > } > > net { > shared-secret 1QP69G4kWDslx2TMiaEStI6bwaGH5y8d; > after-sb-0pri discard-zero-changes; > after-sb-1pri discard-secondary; > after-sb-2pri disconnect; > } > > disk { > on-io-error call-local-io-error; > fencing resource-and-stonith; > } > > } > > /etc/drbd.d/mysql1.res: > > resource mysql1 { > net { > cram-hmac-alg sha1; > } > > on bmimysqlt3.x.x.x { > volume 0 { > device /dev/drbd0; > disk /dev/mapper/mysql_data1; > flexible-meta-disk internal; > } > address x.x.x.x:7788; > } > on bmimysqlt4.x.x.x { > volume 0 { > device /dev/drbd0; > disk /dev/mapper/mysql_data1; > flexible-meta-disk internal; > } > address x.x.x.x:7788; > } > } > > Which logs are you wanting me to share? > > Thanks, > Prakash Just wanted to add this. I repeated my test again and get the exact same results again. Here is /proc/drbd of the primary (bmimysqlt3) and secondary (bmimysqlt4) before the secondary's disk is cut off (disabling the fiber switch port that the secondary is connected to) [root at bmimysqlt3 ~]# cat /proc/drbd version: 8.4.2 (api:1/proto:86-101) GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root at bmimysqlt3.chmcres.cchmc.org, 2012-10-02 00:02:32 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:184 nr:0 dw:160 dr:14317 al:6 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 [root at bmimysqlt4 ~]# cat /proc/drbd version: 8.4.2 (api:1/proto:86-101) GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root at bmimysqlt3.chmcres.cchmc.org, 2012-10-02 00:02:32 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- ns:0 nr:184 dw:184 dr:0 al:0 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 Here is /proc/drbd of primary and secondary about 5 minutes after the disk is cut off. [root at bmimysqlt3 ~]# cat /proc/drbd version: 8.4.2 (api:1/proto:86-101) GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root at bmimysqlt3.chmcres.cchmc.org, 2012-10-02 00:02:32 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:184 nr:0 dw:160 dr:14317 al:6 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 [root at bmimysqlt4 ~]# cat /proc/drbd version: 8.4.2 (api:1/proto:86-101) GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root at bmimysqlt3.chmcres.cchmc.org, 2012-10-02 00:02:32 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- ns:0 nr:184 dw:184 dr:0 al:0 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 As you can see, there is absolutely nothing there to suggest that the secondary even noticed the io-error. I can't understand what is going on. Thanks, Prakash