Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, Thanks for you reply. If I modify the configuration like this on the global_common : global { usage-count yes; # minor-count dialog-refresh disable-ip-verification } common { protocol C; handlers { # The following 3 handlers were disabled due to #576511. # Please check the DRBD manual and enable them, if they make sense in your setup. # pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; # pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; # local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; # fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; # split-brain "/usr/lib/drbd/notify-split-brain.sh root"; # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh; } startup { # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb } disk { # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes # no-disk-drain no-md-flushes max-bio-bvecs } net { ko-count 2 timeout 50 # sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers # max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret # after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork } syncer { # rate after al-extents use-rle cpu-mask verify-alg csums-alg } } If I make this config one the secondary node, I can have a proper disconnection on the slave when we ave HW problems like on my previous post ? Thanks Matthieu Lejeune Le 5/03/14 11:32, Philip Gaw a écrit : >> Hi Matthieu, >> >> On 05/03/2014 07:29, Matthieu Lejeune wrote: >>> Hi all, >>> >>> I had a problem this night with a DRBD Primary/Slave. >>> >>> >>> The slave experienced a hardware issue (LSI controller freezed). >>> It seems the master hold I/O waiting for the slave to respond until >>> timeout. >>> >>> >>> This caused all targets exported trough infiniband to be >>> disconnected from the master. >>> >>> >>> So, practically, the master stop responding due to a failure on the >>> slave. >>> >>> I had to hard reboot (power cycle) the slave because UDEV wasn't >>> responding and did not allow normal reboot. >>> After slave reboot, drdb did reconnect. It was in status pri/sec >>> uptodate/uptodate. >>> But the LSI controller immediatly timeout causing the same issue a >>> second time. >>> >>> >>> How can we prevent issue on the slave to impact the master ? >>> >> have a look at ko-count >> >> |ko-count/|number|/| >> >> In case the secondary node fails to complete a single write >> request for/|count|/times the/|timeout|/, it is expelled from the >> cluster. (I.e. the primary node goes into|StandAlone|mode.) The >> default value is 0, which disables this feature. >> >> >> http://www.drbd.org/users-guide/re-drbdconf.html >> >>> >>> Thank you. >>> Matthieu Lejeune >>> >>> >>> drbd8-utils : 2:8.3.13-2 >>> amd64 RAID 1 over tcp/ip for Linux utilities >>> Debian : >>> root at ifprdstor8a:~/trunk# cat /proc/version >>> Linux version 3.2.0-4-amd64 (debian-kernel at lists.debian.org) (gcc >>> version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.51-1 >>> root at ifprdstor8a:~/trunk# >>> >>> We are using the scst/srpt with the Trunk version of the 7 January 2014 >>> >>> I give you the config : >>> *drbd global : ** >>> * >>> root at ifprdstor8a:/etc/drbd.d# cat global_common.conf >>> global { >>> usage-count yes; >>> # minor-count dialog-refresh disable-ip-verification >>> } >>> >>> common { >>> protocol C; >>> >>> handlers { >>> # The following 3 handlers were disabled due to #576511. >>> # Please check the DRBD manual and enable them, if they make >>> sense in your setup. >>> # pri-on-incon-degr >>> "/usr/lib/drbd/notify-pri-on-incon-degr.sh; >>> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > >>> /proc/sysrq-trigger ; reboot -f"; >>> # pri-lost-after-sb >>> "/usr/lib/drbd/notify-pri-lost-after-sb.sh; >>> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > >>> /proc/sysrq-trigger ; reboot -f"; >>> # local-io-error "/usr/lib/drbd/notify-io-error.sh; >>> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > >>> /proc/sysrq-trigger ; halt -f"; >>> >>> # fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; >>> # split-brain "/usr/lib/drbd/notify-split-brain.sh root"; >>> # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; >>> # before-resync-target >>> "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; >>> # after-resync-target >>> /usr/lib/drbd/unsnapshot-resync-target-lvm.sh; >>> } >>> >>> startup { >>> # wfc-timeout degr-wfc-timeout outdated-wfc-timeout >>> wait-after-sb >>> } >>> >>> disk { >>> # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes >>> # no-disk-drain no-md-flushes max-bio-bvecs >>> } >>> >>> net { >>> # sndbuf-size rcvbuf-size timeout connect-int ping-int >>> ping-timeout max-buffers >>> # max-epoch-size ko-count allow-two-primaries cram-hmac-alg >>> shared-secret >>> # after-sb-0pri after-sb-1pri after-sb-2pri >>> data-integrity-alg no-tcp-cork >>> } >>> >>> syncer { >>> # rate after al-extents use-rle cpu-mask verify-alg csums-alg >>> } >>> } >>> >>> *Ressource Configuration : * >>> >>> root at ifprdstor8a:/etc/drbd.d# cat DSA801.res >>> resource DSA801 { >>> protocol C; >>> >>> startup { >>> wfc-timeout 0; >>> } >>> >>> disk { >>> on-io-error detach; >>> } >>> >>> syncer { >>> rate 400M; >>> verify-alg md5; >>> } >>> >>> on ifprdstor8a { >>> device /dev/drbd1; >>> disk /dev/sda; >>> address 10.13.1.5:7788; >>> meta-disk internal; >>> } >>> >>> on ifprdstor8b { >>> device /dev/drbd1; >>> disk /dev/sda; >>> address 10.13.1.6:7788; >>> meta-disk internal; >>> } >>> } >>> > > > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140310/73e5c26c/attachment.htm>