[Drbd-dev] Please test with CONFIG_PROVE_LOCKING=y

Lars Ellenberg lars.ellenberg at linbit.com
Thu Apr 25 12:56:41 CEST 2019


On Thu, Apr 25, 2019 at 06:30:05PM +0900, Tetsuo Handa wrote:
> I found that simply doing
> 
> # mount /dev/drbd0 /mnt/
> 
> on the primary side causes a lockdep splat on the peer side.
> 

> [   23.039882] ========================================================
> [   23.039906] WARNING: possible irq lock inversion dependency detected
> [   23.039931] 5.0.0 #891 Tainted: G           O
> [   23.039950] --------------------------------------------------------
> [   23.039975] drbd_r_r0/8237 just changed the state of lock:
> [   23.039997] 000000007cc227b6 (&(&connection->epoch_lock)->rlock){+.+.}, at: receive_Data+0x36b/0x1ca0 [drbd]
> [   23.040049] but this lock was taken by another, SOFTIRQ-safe lock in the past:
> [   23.040115]  (&(&resource->req_lock)->rlock){..-.}
> [   23.040117]
> 
> and interrupts could create inverse lock ordering between them.
> 
> [   23.040176]
> other info that might help us debug this:
> [   23.040200]  Possible interrupt unsafe locking scenario:
> 
> [   23.040225]        CPU0                    CPU1
> [   23.040243]        ----                    ----
> [   23.040260]   lock(&(&connection->epoch_lock)->rlock);
> [   23.040281]                                local_irq_disable();
> [   23.040303]                                lock(&(&resource->req_lock)->rlock);
> [   23.040330]                                lock(&(&connection->epoch_lock)->rlock);
> [   23.040359]   <Interrupt>
> [   23.040370]     lock(&(&resource->req_lock)->rlock);
> [   23.040389]
>  *** DEADLOCK ***


Yes.
We already know.
"impossible odds"...

But needs fixing.
Certainly NOT by making all epoch_lock irqsave.
Problem was introduced by me with
f4acb16f drbd: fix lifetime of "need to apply activity log" metadata flag

I "just" need to come up with a way to check what I am checking there
without taking the epoch lock.


> Although making below change seems to solve the lockdep splat,
> I can't check the correctness because I don't know how drbd works.
> Please test with CONFIG_PROVE_LOCKING=y and fix.

See above.
Thanks.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT


More information about the drbd-dev mailing list