Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi all,
I'd like to know if someone know a tip to block the Raid controler or
blocking the I/O ?
I'd like to reproduce our problem to check if the ko-count fix the problem.
Thanks for your help
Matthieu
Le 10/03/14 09:44, Matthieu Lejeune a écrit :
> Hi,
>
> Thanks for you reply.
>
> If I modify the configuration like this on the global_common :
>
> global {
> usage-count yes;
> # minor-count dialog-refresh disable-ip-verification
> }
> common {
> protocol C;
> handlers {
> # The following 3 handlers were disabled due to #576511.
> # Please check the DRBD manual and enable them, if
> they make sense in your setup.
> # pri-on-incon-degr
> "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger
> ; reboot -f";
> # pri-lost-after-sb
> "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger
> ; reboot -f";
> # local-io-error "/usr/lib/drbd/notify-io-error.sh;
> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o >
> /proc/sysrq-trigger ; halt -f";
> # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
> # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
> # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
> # before-resync-target
> "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
> # after-resync-target
> /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
> }
> startup {
> # wfc-timeout degr-wfc-timeout outdated-wfc-timeout
> wait-after-sb
> }
> disk {
> # on-io-error fencing use-bmbv no-disk-barrier
> no-disk-flushes
> # no-disk-drain no-md-flushes max-bio-bvecs
> }
> net {
> ko-count 2
> timeout 50
> # sndbuf-size rcvbuf-size timeout connect-int ping-int
> ping-timeout max-buffers
> # max-epoch-size ko-count allow-two-primaries
> cram-hmac-alg shared-secret
> # after-sb-0pri after-sb-1pri after-sb-2pri
> data-integrity-alg no-tcp-cork
> }
>
> syncer {
> # rate after al-extents use-rle cpu-mask verify-alg
> csums-alg
> }
> }
>
> If I make this config one the secondary node, I can have a proper
> disconnection on the slave when we ave HW problems like on my previous
> post ?
>
> Thanks
>
> Matthieu Lejeune
>
>
>
>
> Le 5/03/14 11:32, Philip Gaw a écrit :
>>> Hi Matthieu,
>>>
>>> On 05/03/2014 07:29, Matthieu Lejeune wrote:
>>>> Hi all,
>>>>
>>>> I had a problem this night with a DRBD Primary/Slave.
>>>>
>>>>
>>>> The slave experienced a hardware issue (LSI controller freezed).
>>>> It seems the master hold I/O waiting for the slave to respond until
>>>> timeout.
>>>>
>>>>
>>>> This caused all targets exported trough infiniband to be
>>>> disconnected from the master.
>>>>
>>>>
>>>> So, practically, the master stop responding due to a failure on the
>>>> slave.
>>>>
>>>> I had to hard reboot (power cycle) the slave because UDEV wasn't
>>>> responding and did not allow normal reboot.
>>>> After slave reboot, drdb did reconnect. It was in status pri/sec
>>>> uptodate/uptodate.
>>>> But the LSI controller immediatly timeout causing the same issue a
>>>> second time.
>>>>
>>>>
>>>> How can we prevent issue on the slave to impact the master ?
>>>>
>>> have a look at ko-count
>>>
>>> |ko-count/|number|/|
>>>
>>> In case the secondary node fails to complete a single write
>>> request for/|count|/times the/|timeout|/, it is expelled from
>>> the cluster. (I.e. the primary node goes into|StandAlone|mode.)
>>> The default value is 0, which disables this feature.
>>>
>>>
>>> http://www.drbd.org/users-guide/re-drbdconf.html
>>>
>>>>
>>>> Thank you.
>>>> Matthieu Lejeune
>>>>
>>>>
>>>> drbd8-utils : 2:8.3.13-2 amd64 RAID
>>>> 1 over tcp/ip for Linux utilities
>>>> Debian :
>>>> root at ifprdstor8a:~/trunk# cat /proc/version
>>>> Linux version 3.2.0-4-amd64 (debian-kernel at lists.debian.org) (gcc
>>>> version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.51-1
>>>> root at ifprdstor8a:~/trunk#
>>>>
>>>> We are using the scst/srpt with the Trunk version of the 7 January 2014
>>>>
>>>> I give you the config :
>>>> *drbd global : **
>>>> *
>>>> root at ifprdstor8a:/etc/drbd.d# cat global_common.conf
>>>> global {
>>>> usage-count yes;
>>>> # minor-count dialog-refresh disable-ip-verification
>>>> }
>>>>
>>>> common {
>>>> protocol C;
>>>>
>>>> handlers {
>>>> # The following 3 handlers were disabled due to #576511.
>>>> # Please check the DRBD manual and enable them, if they
>>>> make sense in your setup.
>>>> # pri-on-incon-degr
>>>> "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
>>>> /usr/lib/drbd/notify-emergency-reboot.sh; echo b >
>>>> /proc/sysrq-trigger ; reboot -f";
>>>> # pri-lost-after-sb
>>>> "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
>>>> /usr/lib/drbd/notify-emergency-reboot.sh; echo b >
>>>> /proc/sysrq-trigger ; reboot -f";
>>>> # local-io-error "/usr/lib/drbd/notify-io-error.sh;
>>>> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o >
>>>> /proc/sysrq-trigger ; halt -f";
>>>>
>>>> # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
>>>> # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>>>> # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
>>>> # before-resync-target
>>>> "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
>>>> # after-resync-target
>>>> /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
>>>> }
>>>>
>>>> startup {
>>>> # wfc-timeout degr-wfc-timeout outdated-wfc-timeout
>>>> wait-after-sb
>>>> }
>>>>
>>>> disk {
>>>> # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
>>>> # no-disk-drain no-md-flushes max-bio-bvecs
>>>> }
>>>>
>>>> net {
>>>> # sndbuf-size rcvbuf-size timeout connect-int ping-int
>>>> ping-timeout max-buffers
>>>> # max-epoch-size ko-count allow-two-primaries cram-hmac-alg
>>>> shared-secret
>>>> # after-sb-0pri after-sb-1pri after-sb-2pri
>>>> data-integrity-alg no-tcp-cork
>>>> }
>>>>
>>>> syncer {
>>>> # rate after al-extents use-rle cpu-mask verify-alg csums-alg
>>>> }
>>>> }
>>>>
>>>> *Ressource Configuration : *
>>>>
>>>> root at ifprdstor8a:/etc/drbd.d# cat DSA801.res
>>>> resource DSA801 {
>>>> protocol C;
>>>>
>>>> startup {
>>>> wfc-timeout 0;
>>>> }
>>>>
>>>> disk {
>>>> on-io-error detach;
>>>> }
>>>>
>>>> syncer {
>>>> rate 400M;
>>>> verify-alg md5;
>>>> }
>>>>
>>>> on ifprdstor8a {
>>>> device /dev/drbd1;
>>>> disk /dev/sda;
>>>> address 10.13.1.5:7788;
>>>> meta-disk internal;
>>>> }
>>>>
>>>> on ifprdstor8b {
>>>> device /dev/drbd1;
>>>> disk /dev/sda;
>>>> address 10.13.1.6:7788;
>>>> meta-disk internal;
>>>> }
>>>> }
>>>>
>>
>>
>>
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
>
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140311/eedaf693/attachment.htm>