[Drbd-dev] Bug(s) with Linux v5.4.46
Lars Ellenberg
lars.ellenberg at linbit.com
Mon Jul 27 09:16:58 CEST 2020
On Sun, Jul 26, 2020 at 08:55:10PM -0700, Sarah Newman wrote:
> kref_put(&device->kref, drbd_destroy_device);
At this point we are "sure" to still hold at least one
additional reference on device.
> del_gendisk(device->vdisk);
> synchronize_rcu();
which we put here:
> kref_put(&device->kref, drbd_destroy_device);
But what you present here shows that in your case that is not true.
There is nothing DRBD specific new in the mentioned kernel version.
> In drbd_destroy_device, there is the line:
>
> memset(device, 0xfd, sizeof(*device));
>
> So I think that drbd_destroy_device must have run before del_gendisk,
> and therefore the reference count for device->kref is unbalanced.
Looks like it.
> I do not know if this is related to the error message:
>
> ASSERTION FAILED: connection->current_epoch->list not empty
>
> or not.
>
> There were no error messages reported on the peer.
>
> FYI, when we've run in debug mode we've seen some ODEBUG errors about
> freeing active objects around the time that DRBD resources were released.
> One was a work_struct and the other was a timer_list. I do not know if
> either of those are related.
You want to show them? Maybe they help in understanding what is going on here.
> The system in question is still up and running in an error state; is
> there any more information you want from it?
No.
But: is this "easily" reproducible? If so: how?
Lars
More information about the drbd-dev
mailing list