[DRBD-user] Re: Real fix for drbd-0.7.12

Philipp Reisner philipp.reisner at linbit.com
Wed Aug 31 19:08:57 CEST 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Ok, here is the real fix!!! The patch is against drbd-0.7.12 plain.

This was really hard. 

In the kernel's API there are two variants of all bitops. The atomic
ones set_bit(), clear_bit(), test_bit() etc...  and the non atomic
ones __set_bit(), __clear_bit() ...

The race condition:
 CPU1 was in an IO completion handler and used the __set_bit(SYNC_STARTED,..)
 there. Non atomic means: First, it fetched the word from memory....
 ... CPU2 was exiting the _drbd_process_ee() function and did the clear bit 
 clear_bit(PROCESS_EE_RUNNING) atomic = fetch, modify and write...
 ... back on CPU1 we now do the modify and write...

So CPU2 sets the PROCESS_EE_RUNNING bit again, because it fetched
the word before CPU1 did it's atomic update.

So I conclude, that the rule is:

  If you use the atomic bitops on a word, you may never ever user the
  non atomic bitops on the same word anywhere in your code.


But it feels good, to understand what was going on!


-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :
-------------- next part --------------
A non-text attachment was scrubbed...
Name: drbd-0.7.12-syncer-fix.diff
Type: text/x-diff
Size: 658 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20050831/d93065ef/attachment.diff>


More information about the drbd-user mailing list