Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Feb 28, 2017 at 06:05:10PM +0530, Sunil Kumar wrote: > Hi Lars, > > Thanks for the suggested changes, it solves the problem. > > >So the context that drbd_md_endio() runs in was not scheduled > >for all the time it takes for drbd_adm_attach() to go all the way from > >"drbd_md_read()" to ... "device->ldev = nbc;", several lock/unlock, > >memory barriers and so on... and only then the drbd_md_endio() context > >managed to evaluate the next line of code? > > I don't see an issue with drbd_md_endio() scheduling, its schedules > properly. The only problem was 'drbd_md_put_buffer' which basically > wakes up drbd_adm_attach process and makes ldev as valid pointer in > case of drbd_md_read(). See http://git.linbit.com/drbd-8.4.git/commitdiff/5f71d962446267f21c47cec59216fcadee2b4e8d drbd: fix potential get_ldev/put_ldev refcount imbalance during attach Race: drbd_adm_attach() | async drbd_md_endio() | device->ldev is still NULL. | | drbd_md_read( | .endio = drbd_md_endio; | submit; | .... | wait for done == 1; | done = 1; ); | wake_up(); .. lot of other stuff, | .. includeing taking and | ...giving up locks, | .. doing further IO, | .. stuff that takes "some time" | | while in this context, | this is the next statement. 0K . | which means this context was scheduled .. only then, finally, | away for "some time". device->ldev = nbc; | | if (device->ldev) | put_ldev() Unlikely, but possible. I was able to provoke it "reliably" by adding an mdelay(500); after the wake_up(). Fixed by moving the if (!NULL) put_ldev() before done = 1; Impact of the bug was that the resulting refcount imbalance could lead to premature destruction of the object, potentially causing a NULL pointer dereference during a subsequent detach. That's what I meant with "not scheduled for all the time ..." Thanks, Lars