[DRBD-user] DRBD Master crash on slave HW problem

Philip Gaw pgaw at darktech.org.uk
Wed Mar 5 11:32:39 CET 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> Hi Matthieu,
>
> On 05/03/2014 07:29, Matthieu Lejeune wrote:
>> Hi all,
>>
>> I had a problem this night with a DRBD Primary/Slave.
>>
>>
>> The slave experienced a hardware issue (LSI controller freezed).
>> It seems the master hold I/O waiting for the slave to respond until 
>> timeout.
>>
>>
>> This caused all targets exported trough infiniband to be disconnected 
>> from the master.
>>
>>
>> So, practically, the master stop responding due to a failure on the 
>> slave.
>>
>> I had to hard reboot (power cycle) the slave because UDEV wasn't 
>> responding and did not allow normal reboot.
>> After slave reboot, drdb did reconnect. It was in status pri/sec 
>> uptodate/uptodate.
>> But the LSI controller immediatly timeout causing the same issue a 
>> second time.
>>
>>
>> How can we prevent issue on the slave to impact the master ?
>>
> have a look at ko-count
>
> |ko-count/|number|/|
>
>     In case the secondary node fails to complete a single write
>     request for/|count|/times the/|timeout|/, it is expelled from the
>     cluster. (I.e. the primary node goes into|StandAlone|mode.) The
>     default value is 0, which disables this feature.
>
>
> http://www.drbd.org/users-guide/re-drbdconf.html
>
>>
>> Thank you.
>> Matthieu Lejeune
>>
>>
>> drbd8-utils :                         2:8.3.13-2 
>> amd64                   RAID 1 over tcp/ip for Linux utilities
>> Debian :
>> root at ifprdstor8a:~/trunk# cat /proc/version
>> Linux version 3.2.0-4-amd64 (debian-kernel at lists.debian.org) (gcc 
>> version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.51-1
>> root at ifprdstor8a:~/trunk#
>>
>> We are using the scst/srpt with the Trunk version of the 7 January 2014
>>
>> I give you the config :
>> *drbd global : **
>> *
>> root at ifprdstor8a:/etc/drbd.d# cat global_common.conf
>> global {
>>     usage-count yes;
>>     # minor-count dialog-refresh disable-ip-verification
>> }
>>
>> common {
>>     protocol C;
>>
>>     handlers {
>>         # The following 3 handlers were disabled due to #576511.
>>         # Please check the DRBD manual and enable them, if they make 
>> sense in your setup.
>>         # pri-on-incon-degr 
>> "/usr/lib/drbd/notify-pri-on-incon-degr.sh; 
>> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > 
>> /proc/sysrq-trigger ; reboot -f";
>>         # pri-lost-after-sb 
>> "/usr/lib/drbd/notify-pri-lost-after-sb.sh; 
>> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > 
>> /proc/sysrq-trigger ; reboot -f";
>>         # local-io-error "/usr/lib/drbd/notify-io-error.sh; 
>> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > 
>> /proc/sysrq-trigger ; halt -f";
>>
>>         # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
>>         # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>>         # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
>>         # before-resync-target 
>> "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
>>         # after-resync-target 
>> /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
>>     }
>>
>>     startup {
>>         # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
>>     }
>>
>>     disk {
>>         # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
>>         # no-disk-drain no-md-flushes max-bio-bvecs
>>     }
>>
>>     net {
>>         # sndbuf-size rcvbuf-size timeout connect-int ping-int 
>> ping-timeout max-buffers
>>         # max-epoch-size ko-count allow-two-primaries cram-hmac-alg 
>> shared-secret
>>         # after-sb-0pri after-sb-1pri after-sb-2pri 
>> data-integrity-alg no-tcp-cork
>>     }
>>
>>     syncer {
>>         # rate after al-extents use-rle cpu-mask verify-alg csums-alg
>>     }
>> }
>>
>> *Ressource Configuration : *
>>
>> root at ifprdstor8a:/etc/drbd.d# cat DSA801.res
>> resource DSA801 {
>>   protocol C;
>>
>>   startup {
>>     wfc-timeout 0;
>>   }
>>
>>   disk {
>>     on-io-error detach;
>>   }
>>
>>   syncer {
>>     rate 400M;
>>     verify-alg md5;
>>   }
>>
>>   on ifprdstor8a {
>>     device    /dev/drbd1;
>>     disk      /dev/sda;
>>     address   10.13.1.5:7788;
>>     meta-disk internal;
>>   }
>>
>>   on ifprdstor8b {
>>     device    /dev/drbd1;
>>     disk      /dev/sda;
>>     address   10.13.1.6:7788;
>>     meta-disk internal;
>>   }
>> }
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140305/ece518c3/attachment.htm>


More information about the drbd-user mailing list