Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
> Hi Matthieu, > > On 05/03/2014 07:29, Matthieu Lejeune wrote: >> Hi all, >> >> I had a problem this night with a DRBD Primary/Slave. >> >> >> The slave experienced a hardware issue (LSI controller freezed). >> It seems the master hold I/O waiting for the slave to respond until >> timeout. >> >> >> This caused all targets exported trough infiniband to be disconnected >> from the master. >> >> >> So, practically, the master stop responding due to a failure on the >> slave. >> >> I had to hard reboot (power cycle) the slave because UDEV wasn't >> responding and did not allow normal reboot. >> After slave reboot, drdb did reconnect. It was in status pri/sec >> uptodate/uptodate. >> But the LSI controller immediatly timeout causing the same issue a >> second time. >> >> >> How can we prevent issue on the slave to impact the master ? >> > have a look at ko-count > > |ko-count/|number|/| > > In case the secondary node fails to complete a single write > request for/|count|/times the/|timeout|/, it is expelled from the > cluster. (I.e. the primary node goes into|StandAlone|mode.) The > default value is 0, which disables this feature. > > > http://www.drbd.org/users-guide/re-drbdconf.html > >> >> Thank you. >> Matthieu Lejeune >> >> >> drbd8-utils : 2:8.3.13-2 >> amd64 RAID 1 over tcp/ip for Linux utilities >> Debian : >> root at ifprdstor8a:~/trunk# cat /proc/version >> Linux version 3.2.0-4-amd64 (debian-kernel at lists.debian.org) (gcc >> version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.51-1 >> root at ifprdstor8a:~/trunk# >> >> We are using the scst/srpt with the Trunk version of the 7 January 2014 >> >> I give you the config : >> *drbd global : ** >> * >> root at ifprdstor8a:/etc/drbd.d# cat global_common.conf >> global { >> usage-count yes; >> # minor-count dialog-refresh disable-ip-verification >> } >> >> common { >> protocol C; >> >> handlers { >> # The following 3 handlers were disabled due to #576511. >> # Please check the DRBD manual and enable them, if they make >> sense in your setup. >> # pri-on-incon-degr >> "/usr/lib/drbd/notify-pri-on-incon-degr.sh; >> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > >> /proc/sysrq-trigger ; reboot -f"; >> # pri-lost-after-sb >> "/usr/lib/drbd/notify-pri-lost-after-sb.sh; >> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > >> /proc/sysrq-trigger ; reboot -f"; >> # local-io-error "/usr/lib/drbd/notify-io-error.sh; >> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > >> /proc/sysrq-trigger ; halt -f"; >> >> # fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; >> # split-brain "/usr/lib/drbd/notify-split-brain.sh root"; >> # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; >> # before-resync-target >> "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; >> # after-resync-target >> /usr/lib/drbd/unsnapshot-resync-target-lvm.sh; >> } >> >> startup { >> # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb >> } >> >> disk { >> # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes >> # no-disk-drain no-md-flushes max-bio-bvecs >> } >> >> net { >> # sndbuf-size rcvbuf-size timeout connect-int ping-int >> ping-timeout max-buffers >> # max-epoch-size ko-count allow-two-primaries cram-hmac-alg >> shared-secret >> # after-sb-0pri after-sb-1pri after-sb-2pri >> data-integrity-alg no-tcp-cork >> } >> >> syncer { >> # rate after al-extents use-rle cpu-mask verify-alg csums-alg >> } >> } >> >> *Ressource Configuration : * >> >> root at ifprdstor8a:/etc/drbd.d# cat DSA801.res >> resource DSA801 { >> protocol C; >> >> startup { >> wfc-timeout 0; >> } >> >> disk { >> on-io-error detach; >> } >> >> syncer { >> rate 400M; >> verify-alg md5; >> } >> >> on ifprdstor8a { >> device /dev/drbd1; >> disk /dev/sda; >> address 10.13.1.5:7788; >> meta-disk internal; >> } >> >> on ifprdstor8b { >> device /dev/drbd1; >> disk /dev/sda; >> address 10.13.1.6:7788; >> meta-disk internal; >> } >> } >> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140305/ece518c3/attachment.htm>