[Drbd-dev] Bug(s) with Linux v5.4.46

Lars Ellenberg lars.ellenberg at linbit.com
Mon Jul 27 09:16:58 CEST 2020


On Sun, Jul 26, 2020 at 08:55:10PM -0700, Sarah Newman wrote:

> 	kref_put(&device->kref, drbd_destroy_device);

At this point we are "sure" to still hold at least one
additional reference on device.

> 	del_gendisk(device->vdisk);
> 	synchronize_rcu();

which we put here:

> 	kref_put(&device->kref, drbd_destroy_device);


But what you present here shows that in your case that is not true.

There is nothing DRBD specific new in the mentioned kernel version.

> In drbd_destroy_device, there is the line:
> 
> memset(device, 0xfd, sizeof(*device));
> 
> So I think that drbd_destroy_device must have run before del_gendisk,
> and therefore the reference count for device->kref is unbalanced.

Looks like it.

> I do not know if this is related to the error message:
> 
>  ASSERTION FAILED: connection->current_epoch->list not empty
> 
> or not.
> 
> There were no error messages reported on the peer.
> 
> FYI, when we've run in debug mode we've seen some ODEBUG errors about
> freeing active objects around the time that DRBD resources were released.
> One was a work_struct and the other was a timer_list. I do not know if
> either of those are related.

You want to show them? Maybe they help in understanding what is going on here.


> The system in question is still up and running in an error state; is
> there any more information you want from it?

No.

But: is this "easily" reproducible? If so: how?

    Lars



More information about the drbd-dev mailing list