[DRBD-user] Removing DRBD Kernel Module Blocks

Andrew Martin amartin at xes-inc.com
Thu Jan 26 22:57:23 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Felix, 


I am using DRBD with pacemaker+heartbeat for a HA cluster. There are no mounted filesystems at this time. Below is a copy of the kernel log after I attempted to stop the drbd service: 

Jan 26 15:44:14 node1 kernel: [177694.517283] block drbd0: Requested state change failed by peer : Refusing to be Primary while peer is not outdated (-7) 
Jan 26 15:44:14 node1 kernel: [177694.873466] block drbd0: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) pdsk( UpToDate -> DUnknown ) 
Jan 26 15:44:14 node1 kernel: [177694.873540] block drbd0: short read expecting header on sock: r=-512 
Jan 26 15:44:14 node1 kernel: [177695.029784] block drbd0: meta connection shut down by peer. 
Jan 26 15:44:14 node1 kernel: [177695.195502] block drbd0: asender terminated 
Jan 26 15:44:14 node1 kernel: [177695.195526] block drbd0: Terminating asender thread 
Jan 26 15:44:14 node1 kernel: [177695.207524] block drbd0: Connection closed 
Jan 26 15:44:14 node1 kernel: [177695.207594] block drbd0: conn( Disconnecting -> StandAlone ) 
Jan 26 15:44:14 node1 kernel: [177695.209382] block drbd0: receiver terminated 
Jan 26 15:44:14 node1 kernel: [177695.209396] block drbd0: Terminating receiver thread 
Jan 26 15:44:14 node1 kernel: [177695.209668] block drbd0: disk( Outdated -> Diskless ) 
Jan 26 15:44:14 node1 kernel: [177695.210268] block drbd0: drbd_bm_resize called with capacity == 0 
Jan 26 15:44:14 node1 kernel: [177695.211205] block drbd0: worker terminated 
Jan 26 15:44:14 node1 kernel: [177695.211208] block drbd0: Terminating worker thread 
Jan 26 15:44:16 node1 kernel: [177696.418762] block drbd1: Requested state change failed by peer: Refusing to be Primary while peer is not outdated (-7) 
Jan 26 15:44:16 node1 kernel: [177696.722826] block drbd1: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) pdsk( UpToDate -> DUnknown ) 
Jan 26 15:44:16 node1 kernel: [177696.722871] block drbd1: short read expecting header on sock: r=-512 
Jan 26 15:44:16 node1 kernel: [177696.880462] block drbd1: meta connection shut down by peer. 
Jan 26 15:44:16 node1 kernel: [177697.037039] block drbd1: asender terminated 
Jan 26 15:44:16 node1 kernel: [177697.037067] block drbd1: Terminating asender thread 
Jan 26 15:44:16 node1 kernel: [177697.037568] block drbd1: Connection closed 
Jan 26 15:44:16 node1 kernel: [177697.037616] block drbd1: conn( Disconnecting -> StandAlone ) 
Jan 26 15:44:16 node1 kernel: [177697.037714] block drbd1: receiver terminated 
Jan 26 15:44:16 node1 kernel: [177697.037716] block drbd1: Terminating receiver thread 
Jan 26 15:44:16 node1 kernel: [177697.037990] block drbd1: disk( Outdated -> Diskless ) 
Jan 26 15:44:16 node1 kernel: [177697.038598] block drbd1: drbd_bm_resize called with capacity == 0 
Jan 26 15:44:16 node1 kernel: [177697.040081] block drbd1: worker terminated 
Jan 26 15:44:16 node1 kernel: [177697.040083] block drbd1: Terminating worker thread 
Jan 26 15:46:40 node1 kernel: [177841.014578] INFO: task rmmod:21024 blocked for more than 120 seconds. 
Jan 26 15:46:40 node1 kernel: [177841.206355] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
Jan 26 15:46:41 node1 kernel: [177841.634666] rmmod D 00000000ffffffff 0 21024 20992 0x00000004 
Jan 26 15:46:41 node1 kernel: [177841.634695] ffff88020c43bc78 0000000000000086 0000000000015e00 0000000000015e00 
Jan 26 15:46:41 node1 kernel: [177841.634709] ffff880214c8dfd0 ffff88020c43bfd8 0000000000015e00 ffff880214c8dc00 
Jan 26 15:46:41 node1 kernel: [177841.634713] 0000000000015e00 ffff88020c43bfd8 0000000000015e00 ffff880214c8dfd0 
Jan 26 15:46:41 node1 kernel: [177841.634720] Call Trace: 
Jan 26 15:46:41 node1 kernel: [177841.634777] [<ffffffff8155e67d>] schedule_timeout+0x22d/0x300 
Jan 26 15:46:41 node1 kernel: [177841.634822] [<ffffffff8102e779>] ? native_smp_send_reschedule+0x49/0x60 
Jan 26 15:46:41 node1 kernel: [177841.634832] [<ffffffff8104ce56>] ? resched_task+0x76/0x90 
Jan 26 15:46:41 node1 kernel: [177841.634839] [<ffffffff8105dd2b>] ? try_to_wake_up+0x2fb/0x480 
Jan 26 15:46:41 node1 kernel: [177841.634843] [<ffffffff8155d7f6>] wait_for_common+0xd6/0x180 
Jan 26 15:46:41 node1 kernel: [177841.634847] [<ffffffff8105deb0>] ? default_wake_function+0x0/0x20 
Jan 26 15:46:41 node1 kernel: [177841.634850] [<ffffffff8155d95d>] wait_for_completion+0x1d/0x20 
Jan 26 15:46:41 node1 kernel: [177841.634859] [<ffffffff81081a35>] flush_cpu_workqueue+0x65/0xa0 
Jan 26 15:46:41 node1 kernel: [177841.634862] [<ffffffff81081bb0>] ? wq_barrier_func+0x0/0x20 
Jan 26 15:46:41 node1 kernel: [177841.634866] [<ffffffff81081d2c>] flush_workqueue+0x4c/0x80 
Jan 26 15:46:41 node1 kernel: [177841.634878] [<ffffffff8109f520>] ? __try_stop_module+0x0/0x50 
Jan 26 15:46:41 node1 kernel: [177841.634890] [<ffffffff810b7c04>] __stop_machine+0xf4/0x120 
Jan 26 15:46:41 node1 kernel: [177841.634894] [<ffffffff8109f520>] ? __try_stop_module+0x0/0x50 
Jan 26 15:46:41 node1 kernel: [177841.634897] [<ffffffff810b7e5e>] stop_machine+0x3e/0x60 
Jan 26 15:46:41 node1 kernel: [177841.634900] [<ffffffff8109e834>] ? find_module+0x34/0x70 
Jan 26 15:46:41 node1 kernel: [177841.634922] [<ffffffff8109fe4e>] sys_delete_module+0x17e/0x270 
Jan 26 15:46:41 node1 kernel: [177841.634936] [<ffffffff81013172>] system_call_fastpath+0x16/0x1b 


I am using the DRBD 8.3.7 backport from this repository: 
https://launchpad.net/~ubuntu-ha/+archive/ppa 


Thanks, 


Andrew 

----- Original Message -----

From: "Felix Frank" <ff at mpexnet.de> 
To: "Andrew Martin" <amartin at xes-inc.com> 
Cc: drbd-user at lists.linbit.com 
Sent: Tuesday, January 10, 2012 2:11:13 AM 
Subject: Re: [DRBD-user] Removing DRBD Kernel Module Blocks 

Hi, 

On 01/09/2012 08:03 PM, Andrew Martin wrote: 
> Shutting these VMs down gracefully takes over 30 minutes as the "rmmod 
> drbd" command in /etc/rc0.d/K08drbd blocks. Moreover, it seems that 
> access to any information related to the kernel modules blocks as well 
> (e.g. "lsmod", accesses to /proc/modules, etc). Do you have any ideas on 
> what is causing the removal of this module to block and how to resolve it? 

what's using those DRBDs and is it cleanly shut down before the drbd 
service is being stopped? 

Are there filesystems in your block devices? Do they get unmounted in time? 

Are there any incriminating kernel log entries after this slow shutdown? 

Cheers, 
Felix 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120126/184ae3db/attachment.htm>


More information about the drbd-user mailing list