Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
> Ok, here is the real fix!!! The patch is against drbd-0.7.12 plain. > > This was really hard. > > In the kernel's API there are two variants of all bitops. The atomic > ones set_bit(), clear_bit(), test_bit() etc... and the non atomic > ones __set_bit(), __clear_bit() ... > > The race condition: > CPU1 was in an IO completion handler and used the __set_bit(SYNC_STARTED,..) > there. Non atomic means: First, it fetched the word from memory.... > ... CPU2 was exiting the _drbd_process_ee() function and did the clear bit > clear_bit(PROCESS_EE_RUNNING) atomic = fetch, modify and write... > ... back on CPU1 we now do the modify and write... > > So CPU2 sets the PROCESS_EE_RUNNING bit again, because it fetched > the word before CPU1 did it's atomic update. > > So I conclude, that the rule is: > > If you use the atomic bitops on a word, you may never ever user the > non atomic bitops on the same word anywhere in your code. > > > But it feels good, to understand what was going on! Good work. I'll test it out soon. Jeff -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: OpenPGP digital signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20050831/7ce2083a/attachment.pgp>