[DRBD-user] Removing DRBD Kernel Module Blocks
Andrew Martin
amartin at xes-inc.com
Thu Jan 26 23:18:24 CET 2012
Correction, it looks like there is now an updated backport (2:8.3.7-1ubuntu2.2) of drbd8-utils in the official Ubuntu repositories:
https://launchpad.net/ubuntu/+source/drbd8
Andrew
----- Original Message -----
From: "Andrew Martin" <amartin at xes-inc.com>
To: "Felix Frank" <ff at mpexnet.de>
Cc: drbd-user at lists.linbit.com
Sent: Thursday, January 26, 2012 3:57:23 PM
Subject: Re: [DRBD-user] Removing DRBD Kernel Module Blocks
Hi Felix,
I am using DRBD with pacemaker+heartbeat for a HA cluster. There are no mounted filesystems at this time. Below is a copy of the kernel log after I attempted to stop the drbd service:
Jan 26 15:44:14 node1 kernel: [177694.517283] block drbd0: Requested state change failed by peer : Refusing to be Primary while peer is not outdated (-7)
Jan 26 15:44:14 node1 kernel: [177694.873466] block drbd0: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) pdsk( UpToDate -> DUnknown )
Jan 26 15:44:14 node1 kernel: [177694.873540] block drbd0: short read expecting header on sock: r=-512
Jan 26 15:44:14 node1 kernel: [177695.029784] block drbd0: meta connection shut down by peer.
Jan 26 15:44:14 node1 kernel: [177695.195502] block drbd0: asender terminated
Jan 26 15:44:14 node1 kernel: [177695.195526] block drbd0: Terminating asender thread
Jan 26 15:44:14 node1 kernel: [177695.207524] block drbd0: Connection closed
Jan 26 15:44:14 node1 kernel: [177695.207594] block drbd0: conn( Disconnecting -> StandAlone )
Jan 26 15:44:14 node1 kernel: [177695.209382] block drbd0: receiver terminated
Jan 26 15:44:14 node1 kernel: [177695.209396] block drbd0: Terminating receiver thread
Jan 26 15:44:14 node1 kernel: [177695.209668] block drbd0: disk( Outdated -> Diskless )
Jan 26 15:44:14 node1 kernel: [177695.210268] block drbd0: drbd_bm_resize called with capacity == 0
Jan 26 15:44:14 node1 kernel: [177695.211205] block drbd0: worker terminated
Jan 26 15:44:14 node1 kernel: [177695.211208] block drbd0: Terminating worker thread
Jan 26 15:44:16 node1 kernel: [177696.418762] block drbd1: Requested state change failed by peer: Refusing to be Primary while peer is not outdated (-7)
Jan 26 15:44:16 node1 kernel: [177696.722826] block drbd1: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) pdsk( UpToDate -> DUnknown )
Jan 26 15:44:16 node1 kernel: [177696.722871] block drbd1: short read expecting header on sock: r=-512
Jan 26 15:44:16 node1 kernel: [177696.880462] block drbd1: meta connection shut down by peer.
Jan 26 15:44:16 node1 kernel: [177697.037039] block drbd1: asender terminated
Jan 26 15:44:16 node1 kernel: [177697.037067] block drbd1: Terminating asender thread
Jan 26 15:44:16 node1 kernel: [177697.037568] block drbd1: Connection closed
Jan 26 15:44:16 node1 kernel: [177697.037616] block drbd1: conn( Disconnecting -> StandAlone )
Jan 26 15:44:16 node1 kernel: [177697.037714] block drbd1: receiver terminated
Jan 26 15:44:16 node1 kernel: [177697.037716] block drbd1: Terminating receiver thread
Jan 26 15:44:16 node1 kernel: [177697.037990] block drbd1: disk( Outdated -> Diskless )
Jan 26 15:44:16 node1 kernel: [177697.038598] block drbd1: drbd_bm_resize called with capacity == 0
Jan 26 15:44:16 node1 kernel: [177697.040081] block drbd1: worker terminated
Jan 26 15:44:16 node1 kernel: [177697.040083] block drbd1: Terminating worker thread
Jan 26 15:46:40 node1 kernel: [177841.014578] INFO: task rmmod:21024 blocked for more than 120 seconds.
Jan 26 15:46:40 node1 kernel: [177841.206355] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 26 15:46:41 node1 kernel: [177841.634666] rmmod D 00000000ffffffff 0 21024 20992 0x00000004
Jan 26 15:46:41 node1 kernel: [177841.634695] ffff88020c43bc78 0000000000000086 0000000000015e00 0000000000015e00
Jan 26 15:46:41 node1 kernel: [177841.634709] ffff880214c8dfd0 ffff88020c43bfd8 0000000000015e00 ffff880214c8dc00
Jan 26 15:46:41 node1 kernel: [177841.634713] 0000000000015e00 ffff88020c43bfd8 0000000000015e00 ffff880214c8dfd0
Jan 26 15:46:41 node1 kernel: [177841.634720] Call Trace:
Jan 26 15:46:41 node1 kernel: [177841.634777] [<ffffffff8155e67d>] schedule_timeout+0x22d/0x300
Jan 26 15:46:41 node1 kernel: [177841.634822] [<ffffffff8102e779>] ? native_smp_send_reschedule+0x49/0x60
Jan 26 15:46:41 node1 kernel: [177841.634832] [<ffffffff8104ce56>] ? resched_task+0x76/0x90
Jan 26 15:46:41 node1 kernel: [177841.634839] [<ffffffff8105dd2b>] ? try_to_wake_up+0x2fb/0x480
Jan 26 15:46:41 node1 kernel: [177841.634843] [<ffffffff8155d7f6>] wait_for_common+0xd6/0x180
Jan 26 15:46:41 node1 kernel: [177841.634847] [<ffffffff8105deb0>] ? default_wake_function+0x0/0x20
Jan 26 15:46:41 node1 kernel: [177841.634850] [<ffffffff8155d95d>] wait_for_completion+0x1d/0x20
Jan 26 15:46:41 node1 kernel: [177841.634859] [<ffffffff81081a35>] flush_cpu_workqueue+0x65/0xa0
Jan 26 15:46:41 node1 kernel: [177841.634862] [<ffffffff81081bb0>] ? wq_barrier_func+0x0/0x20
Jan 26 15:46:41 node1 kernel: [177841.634866] [<ffffffff81081d2c>] flush_workqueue+0x4c/0x80
Jan 26 15:46:41 node1 kernel: [177841.634878] [<ffffffff8109f520>] ? __try_stop_module+0x0/0x50
Jan 26 15:46:41 node1 kernel: [177841.634890] [<ffffffff810b7c04>] __stop_machine+0xf4/0x120
Jan 26 15:46:41 node1 kernel: [177841.634894] [<ffffffff8109f520>] ? __try_stop_module+0x0/0x50
Jan 26 15:46:41 node1 kernel: [177841.634897] [<ffffffff810b7e5e>] stop_machine+0x3e/0x60
Jan 26 15:46:41 node1 kernel: [177841.634900] [<ffffffff8109e834>] ? find_module+0x34/0x70
Jan 26 15:46:41 node1 kernel: [177841.634922] [<ffffffff8109fe4e>] sys_delete_module+0x17e/0x270
Jan 26 15:46:41 node1 kernel: [177841.634936] [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
I am using the DRBD 8.3.7 backport from this repository:
https://launchpad.net/~ubuntu-ha/+archive/ppa
Thanks,
Andrew
----- Original Message -----
From: "Felix Frank" <ff at mpexnet.de>
To: "Andrew Martin" <amartin at xes-inc.com>
Cc: drbd-user at lists.linbit.com
Sent: Tuesday, January 10, 2012 2:11:13 AM
Subject: Re: [DRBD-user] Removing DRBD Kernel Module Blocks
Hi,
On 01/09/2012 08:03 PM, Andrew Martin wrote:
> Shutting these VMs down gracefully takes over 30 minutes as the "rmmod
> drbd" command in /etc/rc0.d/K08drbd blocks. Moreover, it seems that
> access to any information related to the kernel modules blocks as well
> (e.g. "lsmod", accesses to /proc/modules, etc). Do you have any ideas on
> what is causing the removal of this module to block and how to resolve it?
what's using those DRBDs and is it cleanly shut down before the drbd
service is being stopped?
Are there filesystems in your block devices? Do they get unmounted in time?
Are there any incriminating kernel log entries after this slow shutdown?
Cheers,
Felix
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120126/d1ee24b6/attachment-0001.htm>
More information about the drbd-user
mailing list