[Drbd-dev] [DRBD-user] A question about drbd hang in conn_disconnect
Lars Ellenberg
lars.ellenberg at linbit.com
Thu Apr 16 15:46:44 CEST 2015
On Mon, Apr 13, 2015 at 05:45:25PM -0600, Fang Sun wrote:
> Hi drbd developers,
>
> After some research and tests I feel I found the reason of this problem and
> a possible fix in drbd.
> Would you please check if my theory is correct?
>
>
> Let me use 8.4.6 as the code base when I explain it.
> When conn_disconnect hang it is hanging at line drbd_receiver.c:5178
> static int drbd_disconnected(struct drbd_peer_device *peer_device)
> {
> ........
> wait_event(device->misc_wait, !test_bit(BITMAP_IO, &device->flags));
> }
>
> The reason is device has flag BITMAP_IO set.
>
>
> The reason why flag BITMAP_IO is set and not clear is:
> Disk state changes when network is disconnected and after_state_ch is
> called.
>
> At drbd_state.c line 1949 drbd_queue_bitmap_io is called inafter_state_ch()
> .
>
> I think the real reason is in drbd_queue_bitmap_io. drbd_main.c line 3641.
> void drbd_queue_bitmap_io(struct drbd_device *device,
> int (*io_fn)(struct drbd_device *),
> void (*done)(struct drbd_device *, int),
> char *why, enum bm_flag flags)
> {
> .........
> set_bit(BITMAP_IO, &device->flags);
> if (atomic_read(&device->ap_bio_cnt) == 0) {
> if (!test_and_set_bit(BITMAP_IO_QUEUED, &device->flags))
> drbd_queue_work(&first_peer_device(device)->connection->sender_work,
> &device->bm_io_work.w);
> }
> ........
> }
>
> In the code the only code to clear BITMAP_IO is in
> device->bm_io_work.w(w_bitmap_io). But when
> atomic_read(&device->ap_bio_cnt) != 0 the flag BITMAP_IO is set, however
> bm_io_work.w is not called.
> Then drbd_disconnected() is blocked.
>
> Should we move set_bit(BITMAP_IO, &device->flags) to the front of
> drbd_queue_work()?
No. That would be the wrong fix,
and cause potential inconsistencies later.
It may need to be fixed, but in a different way.
Let me (reproduce locally ... and) think about that for a bit.
Thanks,
--
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA and Pacemaker support and consulting
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed
More information about the drbd-dev
mailing list