Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
You need fencing to avoid split-brains. I'm not too familiar with proxmox; Does it use cman + rgmanager or pacemaker behind the scenes? In either case, configure fencing in the cluster stack (called 'stonith' in pacemaker), then configure DRBD to block and call a fence when the peer is lost. This is done by setting 'fencing resource-and-stonith;' and then setting 'fence-handler /path/to/{rhcs_fence,crm-fence-peer.sh};'. Which you use depends on which cluster stack you are using. This way, when drbd would have split-brain, it will instead block until the peer is fenced, ensuring that when it returns to writing, the peer is guaranteed to not be doing the same. digimer On 28/02/14 07:24 AM, Gerald Brandt wrote: > Hi, > > I'm doing tests on a new DRBD setup, so I'm hammering the DRBD system > with reads and writes (3 VMs writing with dd and three VMs reading with > dd). The test max's out my 2x1GigE bonded links (both data and sync) > and max's out my hard drives (5 7200 RPM SATA, RAID6). I share the drbd > disks to Proxmox (KVM based) via NFS v3. > > 1. I tested the system all night, and both DRBD servers handled > everything fine. > 2. I reboot the primary > 3. failover of the IP and NFS worked, and secondary became primary. > 4. reboot server came back up, and entered slit-brain. > > I use uCarp for the failover instead of heartbeat/pacemaker. > > I've used iSCSI over DRBD/heartbeat before, but not NFS. Any ideas why > I hit split brain? > > Gerald > > > drbd.conf > # cat /etc/drbd.conf > # You can find an example in /usr/share/doc/drbd.../drbd.conf.example > > include "drbd.d/global_common.conf"; > # include "drbd.d/*.res"; > > resource target.0 { > protocol C; > > handlers { > pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; > pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; > local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; > outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; > before-resync-target /usr/local/bin/resync-start-RAID6.sh; > after-resync-target /usr/local/bin/resync-end-RAID6.sh; > } > > startup { > degr-wfc-timeout 120; > } > > disk { > on-io-error detach; > } > > net { > cram-hmac-alg sha1; > shared-secret "password"; > after-sb-0pri disconnect; > after-sb-1pri disconnect; > after-sb-2pri disconnect; > rr-conflict disconnect; > sndbuf-size 0; > } > > syncer { > c-plan-ahead 0; > rate 30M; > verify-alg sha1; > # al-extents 257; > al-extents 3389; > } > > on iscsi-filer-1 { > device /dev/drbd0; > disk /dev/md0; > address 192.168.10.1:7789; > flexible-meta-disk /dev/md3; > } > > on iscsi-filer-2 { > device /dev/drbd0; > disk /dev/md0; > address 192.168.10.2:7789; > flexible-meta-disk /dev/md3; > } > } > > resource target.2 { > protocol C; > > handlers { > pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; > pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; > local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; > outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; > before-resync-target /usr/local/bin/resync-start-RAID5.sh; > after-resync-target /usr/local/bin/resync-end-RAID5.sh; > } > > startup { > degr-wfc-timeout 120; > } > > disk { > on-io-error detach; > } > > net { > cram-hmac-alg sha1; > shared-secret "password"; > after-sb-0pri disconnect; > after-sb-1pri disconnect; > after-sb-2pri disconnect; > rr-conflict disconnect; > sndbuf-size 0; > } > > syncer { > c-plan-ahead 0; > rate 30M; > verify-alg sha1; > # al-extents 257; > al-extents 3389; > } > > on iscsi-filer-1 { > device /dev/drbd2; > disk /dev/md2; > address 192.168.10.1:7790; > flexible-meta-disk /dev/md4; > } > > on iscsi-filer-2 { > device /dev/drbd2; > disk /dev/md2; > address 192.168.10.2:7790; > flexible-meta-disk /dev/md4; > } > } > > > ucarp-up > #!/bin/sh &n > bsp; ; > &nb sp; > /sbin/drbdadm primary all > /sbin/ifup $1:ucarp > /sbin/drbdadm primary all > /sbin/drbdadm primary all > /sbin/drbdadm primary all > mount -o defaults,noatime,nodiratime /dev/drbd0 /nfs-exported/raid6 > mount -o defaults,noatime,nodiratime /dev/drbd2 /nfs-exported/raid5 > /etc/init.d/nfs-kernel-server restart > sleep 2 > echo 256 > /proc/fs/nfsd/threads > > > ucarp-down > #!/bin/sh &n > bsp; ; > &nb sp; > /etc/init.d/nfs-kernel-server stop > umount /nfs-exported/raid6 > umount /nfs-exported/raid5 > /sbin/drbdadm secondary all > /sbin/ifdown $1:ucarp > > > > -- > Gerald Brandt > Majentis Technologies > gbr at majentis.com > 204-229-6595 > www.majentis.com > > > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?