[Drbd-dev] How Locking in GFS works...

Philipp Reisner philipp.reisner at linbit.com
Tue Oct 5 21:37:27 CEST 2004


Please also look at the nice PDF!

> > > "Oh my, this is dirty locally too and unacked. We better arbitate now;
> > > ie one side wins and the other one is silently discarded."

9 Support shared disk semantics  ( for GFS, OCFS etc... )

    All the thoughts in this area, imply that the cluster deals
    with split brain situations as discussed in item 6.

  In order to offer a shared disk mode for GFS, we allow both
  nodes to become primary. (This needs to be enabled with the
  config statement net { allow-two-primaries; } )

 Read after write dependencies

  The shared state is available to clusters using protocol C
  and B. It is not usable with protocol A.

  To support the shared state with protocol B, upon a read
  request the node has to check if a new version of the block
  is in the progress of getting written. (== search for it on
  active_ee and done_ee. [ Since it is on active_ee before the 
  RecvAck is sent. ] )
 Global write order

  The major pitfall is the handling of concurrent writes to the
  same block. (Concurrent writes to the same blocks should not 
  happen, but we have to assume that it is possible that the
  synchronisation methods of our upper layer [i.e. openGFS] 
  may fail.)

  Without further handling concurrent writes to the same block
  would get written on each node locally first, then sent
  to the peer and then overwrite the local version on the peer.
  In other words, each node would write its local version first,
  and the peers version of the data.

  Both nodes need to agree to _one_ order, in which such 
  conflicting writes should be carried out.

  Proposed Solution

  We arbitrary select one node (e.g. the node that did the first
  accept() in the drbd_connect() function) and mark it withe the

  The algorithm which is performed upon the reception of a 
  data packet.

  1. Do we have a concurrent request? (i.e. Do I have a request
     to the same block in my transfer log.) If not -> write now.
  2. Have I already got an ACK packet for the concurrent 
     request ? (Has the request the RQ_DRBD_SENT bit already set)
     If yes -> write the data from the data packet afterwards.
  3. Do I have the "discard-concurrent-write-flag" ?
     If yes -> discard the data packet and send an discard notify.
     If no -> Write data from the data packet afterwards.

  BTW, each time we have a concurrent write access, we print
  a warning to the syslog, since this indicates that the layer
  above us is broken!

  [ see also GFS-mode-arbitration.pdf for illustration. ]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GFS-mode-options.pdf
Type: application/pdf
Size: 9808 bytes
Desc: not available
Url : http://lists.linbit.com/pipermail/drbd-dev/attachments/20041005/4cece6bc/GFS-mode-options.pdf

More information about the drbd-dev mailing list