[DRBD-user] kernel module crash?

Stefano Cailotto cailotto at sci.univr.it
Tue Sep 16 10:22:26 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Lars Ellenberg ha scritto:
> On Mon, Sep 15, 2008 at 05:55:19PM +0200, Stefano Cailotto wrote:
>> Hi all,
>> when performing synchronization on a primary/primary drbd (8.2.7rc1), (in
>> which one of them has to be demoted to secondary to have it working),
>> sometimes I obtain a kernel module crash: here follows the output of dmesg
> 
> to be precise: that is NOT a kernel module crash.
> it is "just" some sort of (distributed?) deadlock in drbd_worker.
> my guess would be that you could get it to recover by
> forcefully disconnecting it (plug the cable and wait for 10 seconds)
> or use iptables or "cutter" to reset that connection

> 
> what would be interessting though is,
> what exactly is "sometimes"?
> so, what is it you are doing there?
> anything "special"?

It happens "sometimes" (meaning not all the times I do it) when I switch a 
primary to secondary and perform a sync, so nothing "really" special, only drbd 
normal operations; anyway, googling a bit I found that it seems to be a problem 
related to block/cfq_iosched.c in kernel 2.6.26 series.
I'll investigate and eventually report.

Bye,
Stefano


> 
>> [  917.704708] drbd0: role( Secondary -> Primary )
>> [ 1047.973700] INFO: task cqueue:2379 blocked for more than 120 seconds.
>> [ 1048.039965] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [ 1048.102466] cqueue        D 56b8dc67     0  2379      2
>> [ 1048.164511]        f5080e60 00000046 00000001 56b8dc67 000000d4
>> f5080fec c201e020 00000001
>> [ 1048.168357]        f7883440 00000040 00000001 00000000 00000001
>> c201e020 00000009 00000001
>> [ 1048.234998]        7fffffff 7fffffff f0921e9c 00000002 c02c8d88
>> 00000046 00000000 00000206
>> [ 1048.302177] Call Trace:
>> [ 1048.443239]  [<c02c8d88>] schedule_timeout+0x13/0x86
>> [ 1048.510006]  [<c012d421>] irq_exit+0x50/0x67
>> [ 1048.572493]  [<c01152c8>] smp_apic_timer_interrupt+0x6b/0x77
>> [ 1048.638911]  [<c0109370>] apic_timer_interrupt+0x28/0x30
>> [ 1048.705188]  [<c02c84ad>] wait_for_common+0xaf/0x10f
>> [ 1048.767240]  [<c012225f>] default_wake_function+0x0/0x8
>> [ 1048.829130]  [<f90941c8>] drbd_req_state+0x2d5/0x2f1 [drbd]
>> [ 1048.894741]  [<c01386ac>] autoremove_wake_function+0x0/0x2d
>> [ 1048.960639]  [<f9094205>] _drbd_request_state+0x21/0xa0 [drbd]
>> [ 1049.018127]  [<c0131e49>] specific_send_sig_info+0x7/0x9
>> [ 1049.088010]  [<c0132531>] force_sig_info+0x8b/0x93
>> [ 1049.149265]  [<f9098b44>] drbd_set_role+0x6b/0x490 [drbd]
>> [ 1049.210607]  [<c0123f93>] hrtick_set+0x7a/0xd8
>> [ 1049.279376]  [<f9099381>] drbd_nl_primary+0x3b/0x44 [drbd]
>> [ 1049.337148]  [<f909aa21>] drbd_connector_callback+0xbc/0x144 [drbd]
>> [ 1049.407308]  [<f8e6c0ad>] cn_queue_wrapper+0x0/0x24 [cn]
>> [ 1049.465680]  [<f8e6c0ba>] cn_queue_wrapper+0xd/0x24 [cn]
>> [ 1049.535753]  [<c0135d32>] run_workqueue+0x74/0xf2
>> [ 1049.593686]  [<c013640d>] worker_thread+0x0/0xbd
>> [ 1049.659631]  [<c01364c0>] worker_thread+0xb3/0xbd
>> [ 1049.725190]  [<c01386ac>] autoremove_wake_function+0x0/0x2d
>> [ 1049.790488]  [<c01385eb>] kthread+0x38/0x5d
>> [ 1049.851611]  [<c01385b3>] kthread+0x0/0x5d
>> [ 1049.912392]  [<c01094ff>] kernel_thread_helper+0x7/0x10
>> [ 1049.969041]  =======================
>> [ 1050.033384] INFO: task drbd0_worker:2390 blocked for more than 120 seconds.
>> [ 1050.098415] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 1050.159859] drbd0_worker  D 45fabce3     0  2390      2
>> [ 1050.221208]        f7550560 00000046 f7550588 45fabce3 000000d4 f75506ec c201e020 00000001
>> [ 1050.228506]        00000002 c0122256 00000001 00000000 f747bfbc f74261c4 00000001 00000000
>> [ 1050.290777]        7fffffff 7fffffff f5007f0c 00000002 c02c8d88 00000000 00000003 f74261cc
>> [ 1050.352410] Call Trace:
>> [ 1050.489098]  [<c0122256>] try_to_wake_up+0xe8/0xf1
>> [ 1050.546405]  [<c02c8d88>] schedule_timeout+0x13/0x86
>> [ 1050.607334]  [<c0121260>] __wake_up+0x29/0x39
>> [ 1050.670957]  [<c02c84ad>] wait_for_common+0xaf/0x10f
>> [ 1050.733919]  [<c012225f>] default_wake_function+0x0/0x8
>> [ 1050.793182]  [<c013593f>] call_usermodehelper_exec+0x60/0xa0
>> [ 1050.852190]  [<f9098886>] drbd_khelper+0x80/0xca [drbd]
>> [ 1050.915276]  [<f907f7d1>] drbd_resync_finished+0x4b2/0x4fe [drbd]
>> [ 1050.974268]  [<f908d68f>] w_update_odbm+0x187/0x1c7 [drbd]
>> [ 1051.005103]  [<f907fdac>] drbd_worker+0x219/0x35c [drbd]
>> [ 1051.064581]  [<f90957b5>] drbd_thread_setup+0xf4/0x166 [drbd]
>> [ 1051.126374]  [<f90956c1>] drbd_thread_setup+0x0/0x166 [drbd]
>> [ 1051.179398]  [<c01094ff>] kernel_thread_helper+0x7/0x10
>> [ 1051.231824]  =======================
>> [ 1091.996642] INFO: task o2hb-04079A4A9B:3339 blocked for more than 120 seconds.
>> [ 1092.049983] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 1092.107826] o2hb-04079A4A D f04dc3f8     0  3339      2
>> [ 1092.161765]        f0895200 00000046 c0130887 f04dc3f8 f04dc3c0 f089538c c201e020 00000001
>> [ 1092.161765]        00000000 00027846 00000000 f7945b50 00000000 00000000 00000000 ffffffff
>> [ 1092.225434]        7fffffff 7fffffff f3999eec 00000002 c02c8d88 f79ea0b0 f7945c9c f7945b50
>> [ 1092.284844] Call Trace:
>> [ 1092.407099]  [<c0130887>] lock_timer_base+0x19/0x35
>> [ 1092.470599]  [<c02c8d88>] schedule_timeout+0x13/0x86
>> [ 1092.525835]  [<c01dcf37>] __generic_unplug_device+0x1a/0x1c
>> [ 1092.589073]  [<c01dd311>] generic_unplug_device+0x1e/0x33
>> [ 1092.589077]  [<f9095a0a>] drbd_unplug_fn+0x170/0x1c2 [drbd]
>> [ 1092.589094]  [<c02c84ad>] wait_for_common+0xaf/0x10f
>> [ 1092.589097]  [<c012225f>] default_wake_function+0x0/0x8
>> [ 1092.589103]  [<f90ca09b>] o2hb_do_disk_heartbeat+0x8c0/0x9e3 [ocfs2_nodemanager]
>> [ 1092.589120]  [<f90ca4f1>] o2hb_thread+0x8a/0x3da [ocfs2_nodemanager]
>> [ 1092.589125]  [<c02c8bb5>] schedule+0x63b/0x66d
>> [ 1092.589125]  [<f90ca467>] o2hb_thread+0x0/0x3da [ocfs2_nodemanager]
>> [ 1092.589125]  [<c01385eb>] kthread+0x38/0x5d
>> [ 1092.589125]  [<c01385b3>] kthread+0x0/0x5d
>> [ 1092.589125]  [<c01094ff>] kernel_thread_helper+0x7/0x10
>> [ 1092.589125]  =======================
>> [ 1177.897228] INFO: task cqueue:2379 blocked for more than 120 seconds.
>> [ 1177.961703] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 1178.026661] cqueue        D 56b8dc67     0  2379      2
>> [ 1178.084513]        f5080e60 00000046 00000001 56b8dc67 000000d4 f5080fec c201e020 00000001
>> [ 1178.084513]        f7883440 00000040 00000001 00000000 00000001 c201e020 00000009 00000001
>> [ 1178.159079]        7fffffff 7fffffff f0921e9c 00000002 c02c8d88 00000046 00000000 00000206
>> [ 1178.221488] Call Trace:
>> [ 1178.355940]  [<c02c8d88>] schedule_timeout+0x13/0x86
>> [ 1178.416623]  [<c012d421>] irq_exit+0x50/0x67
>> [ 1178.477058]  [<c01152c8>] smp_apic_timer_interrupt+0x6b/0x77
>> [ 1178.541613]  [<c0109370>] apic_timer_interrupt+0x28/0x30
>> [ 1178.601535]  [<c02c84ad>] wait_for_common+0xaf/0x10f
>> [ 1178.657380]  [<c012225f>] default_wake_function+0x0/0x8
>> [ 1178.713295]  [<f90941c8>] drbd_req_state+0x2d5/0x2f1 [drbd]
>> [ 1178.773435]  [<c01386ac>] autoremove_wake_function+0x0/0x2d
>> [ 1178.837602]  [<f9094205>] _drbd_request_state+0x21/0xa0 [drbd]
>> [ 1178.894013]  [<c0131e49>] specific_send_sig_info+0x7/0x9
>> [ 1178.954391]  [<c0132531>] force_sig_info+0x8b/0x93
>> [ 1179.018440]  [<f9098b44>] drbd_set_role+0x6b/0x490 [drbd]
>> [ 1179.082012]  [<c0123f93>] hrtick_set+0x7a/0xd8
>> [ 1179.144031]  [<f9099381>] drbd_nl_primary+0x3b/0x44 [drbd]
>> [ 1179.199704]  [<f909aa21>] drbd_connector_callback+0xbc/0x144 [drbd]
>> [ 1179.263517]  [<f8e6c0ad>] cn_queue_wrapper+0x0/0x24 [cn]
>> [ 1179.328158]  [<f8e6c0ba>] cn_queue_wrapper+0xd/0x24 [cn]
>> [ 1179.384190]  [<c0135d32>] run_workqueue+0x74/0xf2
>> [ 1179.448085]  [<c013640d>] worker_thread+0x0/0xbd
>> [ 1179.503328]  [<c01364c0>] worker_thread+0xb3/0xbd
>> [ 1179.557650]  [<c01386ac>] autoremove_wake_function+0x0/0x2d
>> [ 1179.619719]  [<c01385eb>] kthread+0x38/0x5d
>> [ 1179.668961]  [<c01385b3>] kthread+0x0/0x5d
>> [ 1179.721684]  [<c01094ff>] kernel_thread_helper+0x7/0x10
>> [ 1179.774313]  =======================
>> [ 1179.830151] INFO: task drbd0_worker:2390 blocked for more than 120 seconds.
>> [ 1179.886847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 1179.940232] drbd0_worker  D 45fabce3     0  2390      2
>> [ 1179.993743]        f7550560 00000046 f7550588 45fabce3 000000d4 f75506ec c201e020 00000001
>> [ 1179.996864]        00000002 c0122256 00000001 00000000 f747bfbc f74261c4 00000001 00000000
>> [ 1180.047966]        7fffffff 7fffffff f5007f0c 00000002 c02c8d88 00000000 00000003 f74261cc
>> [ 1180.110950] Call Trace:
>> [ 1180.236173]  [<c0122256>] try_to_wake_up+0xe8/0xf1
>> [ 1180.290776]  [<c02c8d88>] schedule_timeout+0x13/0x86
>> [ 1180.345188]  [<c0121260>] __wake_up+0x29/0x39
>> [ 1180.402924]  [<c02c84ad>] wait_for_common+0xaf/0x10f
>> [ 1180.461106]  [<c012225f>] default_wake_function+0x0/0x8
>> [ 1180.515185]  [<c013593f>] call_usermodehelper_exec+0x60/0xa0
>> [ 1180.573920]  [<f9098886>] drbd_khelper+0x80/0xca [drbd]
>> [ 1180.636815]  [<f907f7d1>] drbd_resync_finished+0x4b2/0x4fe [drbd]
>> [ 1180.691807]  [<f908d68f>] w_update_odbm+0x187/0x1c7 [drbd]
>> [ 1180.746598]  [<f907fdac>] drbd_worker+0x219/0x35c [drbd]
>> [ 1180.804784]  [<f90957b5>] drbd_thread_setup+0xf4/0x166 [drbd]
>> [ 1180.858547]  [<f90956c1>] drbd_thread_setup+0x0/0x166 [drbd]
>> [ 1180.915507]  [<c01094ff>] kernel_thread_helper+0x7/0x10
>> [ 1180.972635]  =======================
> 


-- 
---------------------------------------------------------------------------
   Stefano Cailotto - Research Consultant
---------------------------------------------------------------------------
   EDALab - Networked Embedded Systems
   c/o  Dipartimento di Informatica - Universita` di Verona
   Strada le Grazie, 15 - 37134 Verona (Italy)
---------------------------------------------------------------------------
   email:  stefano.cailotto at edalab.it stefano.cailotto at univr.it
   web:  http://www.edalab.it
   tel/fax: +39-045-802-7062
---------------------------------------------------------------------------



More information about the drbd-user mailing list