Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Oct 8, 2012, at 9:19 AM, Velayutham, Prakash wrote:
> On Oct 8, 2012, at 4:55 AM, Lars Ellenberg wrote:
>
>> On Sat, Oct 06, 2012 at 01:08:43PM +0000, Velayutham, Prakash wrote:
>>> Hi,
>>>
>>> I recently got a DRBD (8.4.2-2) cluster up (still testing). It seems to work nicely with Pacemaker CRM in several scenarios I have tested. Here is my config.
>>>
>>> global {
>>> usage-count yes;
>>> }
>>>
>>> common {
>>> handlers {
>>> outdate-peer /usr/lib/drbd/crm-fence-peer.sh;
>>> fence-peer /usr/lib/drbd/crm-fence-peer.sh;
>>> after-resync-target /usr/lib/drbd/crm-unfence-peer.sh;
>>> local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
>>> split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>>> }
>>>
>>> startup {
>>> degr-wfc-timeout 0;
>>> }
>>>
>>> net {
>>> shared-secret 1QP69G4kWDslx2TMiaEStI6bwaGH5y8d;
>>> after-sb-0pri discard-zero-changes;
>>> after-sb-1pri discard-secondary;
>>> after-sb-2pri disconnect;
>>> }
>>>
>>> disk {
>>> on-io-error call-local-io-error;
>>> fencing resource-and-stonith;
>>> }
>>>
>>> }
>>>
>>> The io-error handler only gets called when the primary node has a disk
>>> issue. I have not seen the secondary node call the "local-io-error"
>>> handler when it had disk access issues. Is this by design?
>>
>> No.
>>
>> "Works for me", though.
>>
>> Can you please double check?
>> And if in fact you can reproduce, tell us how, including logs?
>>
>>
>> Thanks,
>>
>> --
>> : Lars Ellenberg
>
> Hi Lars,
>
> If I disable all the FC ports in the fiber switch just for the primary node, the node fences, reboots and comes up, as I would expect. With the exact same config, if I disable the FC ports just for the secondary node, the node just sits there and it even shows up as Secondary in /proc/drbd. That sounds odd and sounds like the config should be "diskless", but it is "call-local-io-error".
>
> Here is the full config.
>
> /etc/drbd.conf
>
> ## generated by drbd-gui
>
> include "drbd.d/global_common.conf";
> include "drbd.d/*.res";
>
> /etc/drbd.d/global_common.conf:
>
> ## generated by drbd-gui
>
> global {
> usage-count yes;
> }
>
> common {
> handlers {
> fence-peer /usr/lib/drbd/crm-fence-peer.sh;
> after-resync-target /usr/lib/drbd/crm-unfence-peer.sh;
> local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
> split-brain "/usr/lib/drbd/notify-split-brain.sh root";
> }
>
> startup {
> degr-wfc-timeout 0;
> }
>
> net {
> shared-secret 1QP69G4kWDslx2TMiaEStI6bwaGH5y8d;
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
> }
>
> disk {
> on-io-error call-local-io-error;
> fencing resource-and-stonith;
> }
>
> }
>
> /etc/drbd.d/mysql1.res:
>
> resource mysql1 {
> net {
> cram-hmac-alg sha1;
> }
>
> on bmimysqlt3.x.x.x {
> volume 0 {
> device /dev/drbd0;
> disk /dev/mapper/mysql_data1;
> flexible-meta-disk internal;
> }
> address x.x.x.x:7788;
> }
> on bmimysqlt4.x.x.x {
> volume 0 {
> device /dev/drbd0;
> disk /dev/mapper/mysql_data1;
> flexible-meta-disk internal;
> }
> address x.x.x.x:7788;
> }
> }
>
> Which logs are you wanting me to share?
>
> Thanks,
> Prakash
Just wanted to add this. I repeated my test again and get the exact same results again. Here is /proc/drbd of the primary (bmimysqlt3) and secondary (bmimysqlt4) before the secondary's disk is cut off (disabling the fiber switch port that the secondary is connected to)
[root at bmimysqlt3 ~]# cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root at bmimysqlt3.chmcres.cchmc.org, 2012-10-02 00:02:32
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:184 nr:0 dw:160 dr:14317 al:6 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
[root at bmimysqlt4 ~]# cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root at bmimysqlt3.chmcres.cchmc.org, 2012-10-02 00:02:32
0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
ns:0 nr:184 dw:184 dr:0 al:0 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
Here is /proc/drbd of primary and secondary about 5 minutes after the disk is cut off.
[root at bmimysqlt3 ~]# cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root at bmimysqlt3.chmcres.cchmc.org, 2012-10-02 00:02:32
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:184 nr:0 dw:160 dr:14317 al:6 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
[root at bmimysqlt4 ~]# cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root at bmimysqlt3.chmcres.cchmc.org, 2012-10-02 00:02:32
0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
ns:0 nr:184 dw:184 dr:0 al:0 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
As you can see, there is absolutely nothing there to suggest that the secondary even noticed the io-error.
I can't understand what is going on.
Thanks,
Prakash