[DRBD-user] Digest mismatch resulting in "split brain" after (!) automatic reconnect

Raoul Bhatia [IPAX] r.bhatia at ipax.at
Mon Feb 21 13:21:11 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


hi,


On 02/21/2011 10:36 AM, Lars Ellenberg wrote:
> Fix your fence-peer helper,
> that may be the cause of trouble there.

which actuall is 'your' fence-peer helper, right? :)

Feb 16 03:13:45 c02n01 kernel: [3675911.371516] block drbd0: updated
UUIDs A9AE9E56A0D5D66F:0000000000000000:3E9700A8847A37AD:3E9600A8847A37AD
Feb 16 03:13:45 c02n01 kernel: [3675911.371635] block drbd0: conn(
SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
Feb 16 03:13:45 c02n01 kernel: [3675911.505550] block drbd0: bitmap
WRITE of 3050 pages took 34 jiffies
Feb 16 03:13:45 c02n01 kernel: [3675911.505615] block drbd0: 0 KB (0
bits) marked out-of-sync by on disk bit-map.
Feb 16 03:13:45 c02n01 cibadmin: [14957]: info: Invoked: cibadmin -Q -t 1
Feb 16 03:13:45 c02n01 crm-fence-peer.sh[14918]: WARNING peer is
Secondary, did not place the constraint!
Feb 16 03:13:45 c02n01 kernel: [3675912.019501] block drbd0: helper
command: /sbin/drbdadm fence-peer minor-0 exit code 1 (0x100)
Feb 16 03:13:45 c02n01 kernel: [3675912.019622] block drbd0: fence-peer
helper broken, returned 1
Feb 16 03:13:45 c02n01 kernel: [3675912.019687] block drbd0: pdsk(
UpToDate -> DUnknown )
Feb 16 03:13:45 c02n01 kernel: [3675912.019768] block drbd0: new current
UUID 6798C570121477F1:A9AE9E56A0D5D66F:3E9700A8847A37AD:3E9600A8847A37AD

thus, basically coming back to [1] where florian asks:
> Look at your paste. You have no node where DRBD is Secondary. What do
> you expect the agent to do? 

(i know, i talked about the agent in this email. but the the agent and
crm-fence-peer.sh are closely tied, aren't they?)

looking at crm-fence-peer.sh's source, i see:
>         Secondary|Primary)
>                 # WTF? We are supposed to fence the peer,
>                 # but the replication link is just fine?
>                 echo WARNING "peer is $DRBD_peer, did not place the constraint!"
>                 rc=0
>                 return
>                 ;;
>         esac

so, this should actually be obsoleted by fixing the following bug,
right?

on the other hand, what's wrong in trying to disconnect and reconnect
the resources and see what happens? (e.g. via a tiny contraint that is
only valid for PT1M?

> Feb 16 06:25:04 c02n01 kernel: [3687390.947555] block drbd1: pdsk( UpToDate -> DUnknown )
> 
> This should not have happened, either:
> We must not change the pdsk state to DUnknown while keeping conn state at Connected.
> That's nonsense.
> 
> Feb 16 06:25:04 c02n01 kernel: [3687390.947633] block drbd1: new current UUID 89084B22FE454C03:3C1DADF6B38C1AD7:E7E50184F3F3AC0B:E7E40184F3F3AC0B 

please let me know if you need any further input from my side.

thanks,
raoul

[1] http://www.gossamer-threads.com/lists/drbd/users/20605#20605
-- 
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc.          email.          r.bhatia at ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG          web.          http://www.ipax.at
Barawitzkagasse 10/2/2/11           email.            office at ipax.at
1190 Wien                           tel.               +43 1 3670030
FN 277995t HG Wien                  fax.            +43 1 3670030 15
____________________________________________________________________



More information about the drbd-user mailing list