[DRBD-user] Stuck in Standalone

Thu Apr 16 18:45:27 CEST 2015

Easiest thing to do is to configure proper stonith (configure + test),
then change drbd to use 'fencing resource-and-stonith;' and use the
'fence-peer "/usr/lib/drbd/crm-fence-peer.sh";' and
'before-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";'.

That way, you avoid split-brains entirely. You also need stonith in
pacemaker anyway, so win-win.

On 15/04/15 08:49 PM, Keith Ouellette wrote:
> We have two nodes that have two drbd resources for two different
> applications on a pair of servers managed by Pacemaker. All looks to
> work fine when the primary node is put into standby or power cycled.
> Meaning that the drbd Primary gets moved to the new active node and the
> applications continue to run as expected. I have an issue when I pull
> the Ethernet out of the primary node and let it sit there for about a
> half hour. When I unplug it the Primary gets moved as expected and the
> applications continue to work. However, when I plug the Ethernet back
> into the system, both nodes go into a standalone state.
> 
>  
> 
> *Node 1:*
> 
>  
> 
> drbd driver loaded OK; device status:
> version: 8.4.3 (api:1/proto:86-101)
> srcversion: F97798065516C94BE0F27DC
> m:res  cs          ro               ds                 p       mounted 
> fstype
> 0:r0   StandAlone  Primary/Unknown  UpToDate/DUnknown  r-----  ext4
> 1:r1   StandAlone  Primary/Unknown  UpToDate/DUnknown  r-----  ext4
> 
>  
> 
> *Node 2:*
> 
>  
> 
> drbd driver loaded OK; device status:
> version: 8.4.3 (api:1/proto:86-101)
> srcversion: F97798065516C94BE0F27DC 
> m:res  cs          ro                 ds                 p      
> mounted  fstype
> 0:r0   StandAlone  Secondary/Unknown  UpToDate/DUnknown  r-----
> 1:r1   StandAlone  Secondary/Unknown  UpToDate/DUnknown  r-----
> 
>  
> 
> As you can see one knows it is Primary and that is what the applications
> continue to run on. The second node knows it should be Secondary. All I
> do to resolve this is connect the resources on each node with the
> Secondary having the –discard-my-data option.
> 
>  
> 
> Is there a way to have the connects done automatically. This looks to be
> a type of “split brain’ and I do have that configured in the
> global.common.conf:
> 
>  
> 
> global {
>         usage-count no;
>         # minor-count dialog-refresh disable-ip-verification
> }
> 
> common {
>         handlers {
>                 # These are EXAMPLE handlers only.
>                 # They may have severe implications,
>                 # like hard resetting the node under certain circumstances.
>                 # Be careful when chosing your poison.
> 
>                 # pri-on-incon-degr
> "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
> reboot -f";
>                 # pri-lost-after-sb
> "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
> reboot -f";
>                 # local-io-error "/usr/lib/drbd/notify-io-error.sh;
> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger
> ; halt -f";
>                 # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
>           *      split-brain "/usr/lib/drbd/notify-split-brain.sh root";*
>                 # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
>                 # before-resync-target
> "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
>                 # after-resync-target
> /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
>         }
> 
>         startup {
>                 # wfc-timeout degr-wfc-timeout outdated-wfc-timeout
> wait-after-sb
>         }
> 
>         options {
>                 # cpu-mask on-no-data-accessible
>         }
> 
>         disk {
>                 # size max-bio-bvecs on-io-error fencing disk-barrier
> disk-flushes
>                 # disk-drain md-flushes resync-rate resync-after al-extents
>                 # c-plan-ahead c-delay-target c-fill-target c-max-rate
>                 # c-min-rate disk-timeout
>         }
> 
>         net {
>  *               after-sb-0pri discard-zero-changes;
>                 after-sb-1pri discard-secondary;
>                 # after-sb-2pri consensus;
>                 after-sb-2pri disconnect;*
>                 # protocol timeout max-epoch-size max-buffers
> unplug-watermark
>                 # connect-int ping-int sndbuf-size rcvbuf-size ko-count
>                 # allow-two-primaries cram-hmac-alg shared-secret
> after-sb-0pri
>                 # after-sb-1pri after-sb-2pri always-asbp rr-conflict
>                 # ping-timeout data-integrity-alg tcp-cork on-congestion
>                 # congestion-fill congestion-extents csums-alg verify-alg
>                 # use-rle
>         }
> }
> 
>  
> 
> The following are also the resource files:
> 
>  
> 
> r0.res:
> 
>  
> 
> resource r0 {
>         on Node1 {
>                 volume 0 {
>                         device          /dev/drbd0;
>                         disk            /dev/ Node1-vg/AOS;
>                         flexible-meta-disk      internal;
>                 }
>                 address         10.0.6.221:7788;
>         }
>         on Node2 {
>                 volume 0 {
>                         device          /dev/drbd0;
>                         disk            /dev/ Node2-vg/AOS;
>                         flexible-meta-disk      internal;
>                 }
>                 address         10.0.6.222:7788;
>         }
> }
> 
>  
> 
> r1.res:
> 
>  
> 
> resource r1 {
>         on Node1 {
>                 volume 0 {
>                         device          /dev/drbd1;
>                         disk            /dev/ Node1-vg/Controller;
>                         flexible-meta-disk      internal;
>                 }
>                 address         10.0.6.221:7789;
>         }
>         on Node2 {
>                 volume 0 {
>                         device          /dev/drbd1;
>                         disk            /dev/ Node2-vg/Controller;
>                         flexible-meta-disk      internal;
>                 }
>                 address         10.0.6.222:7789;
>         }
> }
> 
>  
> 
> I am not sure if this is possible, but I figured I would ask.
> 
>  
> 
> Thanks,
> Keith
> 
>  
> 
> <http://www.fibermountain.com>
> 
> Keith Ouellette
> 
> 
> /KeithO at fibermountain.com/
> 
> 700 West Johnson Avenue
> Cheshire, CT06410
> www.fibermountain.com
> 
> 	
> 
> 	
> 
> P. (203) 806-4046
> C. (860) 810-4877
> F. (845) 358-7882
> 
> 	
> 
>  
> 
> Disclaimer: The information contained in this communication is
> confidential, may be privileged and is intended for the exclusive use of
> the above named addressee(s). If you are not the intended recipient(s),
> you are expressly prohibited from copying, distributing, disseminating,
> or in any other way using any information contained within this
> communication. If you have received this communication in error, please
> contact the sender by telephone or by response via mail. We have taken
> precautions to minimize the risk of transmitting software viruses, but
> we advise you to carry out your own virus checks on this message, as
> well as any attachments. We cannot accept liability for any loss or
> damage caused by software viruses.
> 
> 
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?