[Drbd-dev] How Locking in GFS works...
Philipp Reisner
philipp.reisner at linbit.com
Tue Oct 5 21:37:27 CEST 2004
Hi!
Please also look at the nice PDF!
> > > "Oh my, this is dirty locally too and unacked. We better arbitate now;
> > > ie one side wins and the other one is silently discarded."
9 Support shared disk semantics ( for GFS, OCFS etc... )
All the thoughts in this area, imply that the cluster deals
with split brain situations as discussed in item 6.
In order to offer a shared disk mode for GFS, we allow both
nodes to become primary. (This needs to be enabled with the
config statement net { allow-two-primaries; } )
Read after write dependencies
The shared state is available to clusters using protocol C
and B. It is not usable with protocol A.
To support the shared state with protocol B, upon a read
request the node has to check if a new version of the block
is in the progress of getting written. (== search for it on
active_ee and done_ee. [ Since it is on active_ee before the
RecvAck is sent. ] )
Global write order
The major pitfall is the handling of concurrent writes to the
same block. (Concurrent writes to the same blocks should not
happen, but we have to assume that it is possible that the
synchronisation methods of our upper layer [i.e. openGFS]
may fail.)
Without further handling concurrent writes to the same block
would get written on each node locally first, then sent
to the peer and then overwrite the local version on the peer.
In other words, each node would write its local version first,
and the peers version of the data.
Both nodes need to agree to _one_ order, in which such
conflicting writes should be carried out.
Proposed Solution
We arbitrary select one node (e.g. the node that did the first
accept() in the drbd_connect() function) and mark it withe the
discard-concurrent-write-flag.
The algorithm which is performed upon the reception of a
data packet.
1. Do we have a concurrent request? (i.e. Do I have a request
to the same block in my transfer log.) If not -> write now.
2. Have I already got an ACK packet for the concurrent
request ? (Has the request the RQ_DRBD_SENT bit already set)
If yes -> write the data from the data packet afterwards.
3. Do I have the "discard-concurrent-write-flag" ?
If yes -> discard the data packet and send an discard notify.
If no -> Write data from the data packet afterwards.
BTW, each time we have a concurrent write access, we print
a warning to the syslog, since this indicates that the layer
above us is broken!
[ see also GFS-mode-arbitration.pdf for illustration. ]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GFS-mode-options.pdf
Type: application/pdf
Size: 9808 bytes
Desc: not available
Url : http://lists.linbit.com/pipermail/drbd-dev/attachments/20041005/4cece6bc/GFS-mode-options.pdf
More information about the drbd-dev
mailing list