Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Lars Ellenberg ha scritto: > On Mon, Sep 15, 2008 at 05:55:19PM +0200, Stefano Cailotto wrote: >> Hi all, >> when performing synchronization on a primary/primary drbd (8.2.7rc1), (in >> which one of them has to be demoted to secondary to have it working), >> sometimes I obtain a kernel module crash: here follows the output of dmesg > > to be precise: that is NOT a kernel module crash. > it is "just" some sort of (distributed?) deadlock in drbd_worker. > my guess would be that you could get it to recover by > forcefully disconnecting it (plug the cable and wait for 10 seconds) > or use iptables or "cutter" to reset that connection > > what would be interessting though is, > what exactly is "sometimes"? > so, what is it you are doing there? > anything "special"? It happens "sometimes" (meaning not all the times I do it) when I switch a primary to secondary and perform a sync, so nothing "really" special, only drbd normal operations; anyway, googling a bit I found that it seems to be a problem related to block/cfq_iosched.c in kernel 2.6.26 series. I'll investigate and eventually report. Bye, Stefano > >> [ 917.704708] drbd0: role( Secondary -> Primary ) >> [ 1047.973700] INFO: task cqueue:2379 blocked for more than 120 seconds. >> [ 1048.039965] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 1048.102466] cqueue D 56b8dc67 0 2379 2 >> [ 1048.164511] f5080e60 00000046 00000001 56b8dc67 000000d4 >> f5080fec c201e020 00000001 >> [ 1048.168357] f7883440 00000040 00000001 00000000 00000001 >> c201e020 00000009 00000001 >> [ 1048.234998] 7fffffff 7fffffff f0921e9c 00000002 c02c8d88 >> 00000046 00000000 00000206 >> [ 1048.302177] Call Trace: >> [ 1048.443239] [<c02c8d88>] schedule_timeout+0x13/0x86 >> [ 1048.510006] [<c012d421>] irq_exit+0x50/0x67 >> [ 1048.572493] [<c01152c8>] smp_apic_timer_interrupt+0x6b/0x77 >> [ 1048.638911] [<c0109370>] apic_timer_interrupt+0x28/0x30 >> [ 1048.705188] [<c02c84ad>] wait_for_common+0xaf/0x10f >> [ 1048.767240] [<c012225f>] default_wake_function+0x0/0x8 >> [ 1048.829130] [<f90941c8>] drbd_req_state+0x2d5/0x2f1 [drbd] >> [ 1048.894741] [<c01386ac>] autoremove_wake_function+0x0/0x2d >> [ 1048.960639] [<f9094205>] _drbd_request_state+0x21/0xa0 [drbd] >> [ 1049.018127] [<c0131e49>] specific_send_sig_info+0x7/0x9 >> [ 1049.088010] [<c0132531>] force_sig_info+0x8b/0x93 >> [ 1049.149265] [<f9098b44>] drbd_set_role+0x6b/0x490 [drbd] >> [ 1049.210607] [<c0123f93>] hrtick_set+0x7a/0xd8 >> [ 1049.279376] [<f9099381>] drbd_nl_primary+0x3b/0x44 [drbd] >> [ 1049.337148] [<f909aa21>] drbd_connector_callback+0xbc/0x144 [drbd] >> [ 1049.407308] [<f8e6c0ad>] cn_queue_wrapper+0x0/0x24 [cn] >> [ 1049.465680] [<f8e6c0ba>] cn_queue_wrapper+0xd/0x24 [cn] >> [ 1049.535753] [<c0135d32>] run_workqueue+0x74/0xf2 >> [ 1049.593686] [<c013640d>] worker_thread+0x0/0xbd >> [ 1049.659631] [<c01364c0>] worker_thread+0xb3/0xbd >> [ 1049.725190] [<c01386ac>] autoremove_wake_function+0x0/0x2d >> [ 1049.790488] [<c01385eb>] kthread+0x38/0x5d >> [ 1049.851611] [<c01385b3>] kthread+0x0/0x5d >> [ 1049.912392] [<c01094ff>] kernel_thread_helper+0x7/0x10 >> [ 1049.969041] ======================= >> [ 1050.033384] INFO: task drbd0_worker:2390 blocked for more than 120 seconds. >> [ 1050.098415] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [ 1050.159859] drbd0_worker D 45fabce3 0 2390 2 >> [ 1050.221208] f7550560 00000046 f7550588 45fabce3 000000d4 f75506ec c201e020 00000001 >> [ 1050.228506] 00000002 c0122256 00000001 00000000 f747bfbc f74261c4 00000001 00000000 >> [ 1050.290777] 7fffffff 7fffffff f5007f0c 00000002 c02c8d88 00000000 00000003 f74261cc >> [ 1050.352410] Call Trace: >> [ 1050.489098] [<c0122256>] try_to_wake_up+0xe8/0xf1 >> [ 1050.546405] [<c02c8d88>] schedule_timeout+0x13/0x86 >> [ 1050.607334] [<c0121260>] __wake_up+0x29/0x39 >> [ 1050.670957] [<c02c84ad>] wait_for_common+0xaf/0x10f >> [ 1050.733919] [<c012225f>] default_wake_function+0x0/0x8 >> [ 1050.793182] [<c013593f>] call_usermodehelper_exec+0x60/0xa0 >> [ 1050.852190] [<f9098886>] drbd_khelper+0x80/0xca [drbd] >> [ 1050.915276] [<f907f7d1>] drbd_resync_finished+0x4b2/0x4fe [drbd] >> [ 1050.974268] [<f908d68f>] w_update_odbm+0x187/0x1c7 [drbd] >> [ 1051.005103] [<f907fdac>] drbd_worker+0x219/0x35c [drbd] >> [ 1051.064581] [<f90957b5>] drbd_thread_setup+0xf4/0x166 [drbd] >> [ 1051.126374] [<f90956c1>] drbd_thread_setup+0x0/0x166 [drbd] >> [ 1051.179398] [<c01094ff>] kernel_thread_helper+0x7/0x10 >> [ 1051.231824] ======================= >> [ 1091.996642] INFO: task o2hb-04079A4A9B:3339 blocked for more than 120 seconds. >> [ 1092.049983] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [ 1092.107826] o2hb-04079A4A D f04dc3f8 0 3339 2 >> [ 1092.161765] f0895200 00000046 c0130887 f04dc3f8 f04dc3c0 f089538c c201e020 00000001 >> [ 1092.161765] 00000000 00027846 00000000 f7945b50 00000000 00000000 00000000 ffffffff >> [ 1092.225434] 7fffffff 7fffffff f3999eec 00000002 c02c8d88 f79ea0b0 f7945c9c f7945b50 >> [ 1092.284844] Call Trace: >> [ 1092.407099] [<c0130887>] lock_timer_base+0x19/0x35 >> [ 1092.470599] [<c02c8d88>] schedule_timeout+0x13/0x86 >> [ 1092.525835] [<c01dcf37>] __generic_unplug_device+0x1a/0x1c >> [ 1092.589073] [<c01dd311>] generic_unplug_device+0x1e/0x33 >> [ 1092.589077] [<f9095a0a>] drbd_unplug_fn+0x170/0x1c2 [drbd] >> [ 1092.589094] [<c02c84ad>] wait_for_common+0xaf/0x10f >> [ 1092.589097] [<c012225f>] default_wake_function+0x0/0x8 >> [ 1092.589103] [<f90ca09b>] o2hb_do_disk_heartbeat+0x8c0/0x9e3 [ocfs2_nodemanager] >> [ 1092.589120] [<f90ca4f1>] o2hb_thread+0x8a/0x3da [ocfs2_nodemanager] >> [ 1092.589125] [<c02c8bb5>] schedule+0x63b/0x66d >> [ 1092.589125] [<f90ca467>] o2hb_thread+0x0/0x3da [ocfs2_nodemanager] >> [ 1092.589125] [<c01385eb>] kthread+0x38/0x5d >> [ 1092.589125] [<c01385b3>] kthread+0x0/0x5d >> [ 1092.589125] [<c01094ff>] kernel_thread_helper+0x7/0x10 >> [ 1092.589125] ======================= >> [ 1177.897228] INFO: task cqueue:2379 blocked for more than 120 seconds. >> [ 1177.961703] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [ 1178.026661] cqueue D 56b8dc67 0 2379 2 >> [ 1178.084513] f5080e60 00000046 00000001 56b8dc67 000000d4 f5080fec c201e020 00000001 >> [ 1178.084513] f7883440 00000040 00000001 00000000 00000001 c201e020 00000009 00000001 >> [ 1178.159079] 7fffffff 7fffffff f0921e9c 00000002 c02c8d88 00000046 00000000 00000206 >> [ 1178.221488] Call Trace: >> [ 1178.355940] [<c02c8d88>] schedule_timeout+0x13/0x86 >> [ 1178.416623] [<c012d421>] irq_exit+0x50/0x67 >> [ 1178.477058] [<c01152c8>] smp_apic_timer_interrupt+0x6b/0x77 >> [ 1178.541613] [<c0109370>] apic_timer_interrupt+0x28/0x30 >> [ 1178.601535] [<c02c84ad>] wait_for_common+0xaf/0x10f >> [ 1178.657380] [<c012225f>] default_wake_function+0x0/0x8 >> [ 1178.713295] [<f90941c8>] drbd_req_state+0x2d5/0x2f1 [drbd] >> [ 1178.773435] [<c01386ac>] autoremove_wake_function+0x0/0x2d >> [ 1178.837602] [<f9094205>] _drbd_request_state+0x21/0xa0 [drbd] >> [ 1178.894013] [<c0131e49>] specific_send_sig_info+0x7/0x9 >> [ 1178.954391] [<c0132531>] force_sig_info+0x8b/0x93 >> [ 1179.018440] [<f9098b44>] drbd_set_role+0x6b/0x490 [drbd] >> [ 1179.082012] [<c0123f93>] hrtick_set+0x7a/0xd8 >> [ 1179.144031] [<f9099381>] drbd_nl_primary+0x3b/0x44 [drbd] >> [ 1179.199704] [<f909aa21>] drbd_connector_callback+0xbc/0x144 [drbd] >> [ 1179.263517] [<f8e6c0ad>] cn_queue_wrapper+0x0/0x24 [cn] >> [ 1179.328158] [<f8e6c0ba>] cn_queue_wrapper+0xd/0x24 [cn] >> [ 1179.384190] [<c0135d32>] run_workqueue+0x74/0xf2 >> [ 1179.448085] [<c013640d>] worker_thread+0x0/0xbd >> [ 1179.503328] [<c01364c0>] worker_thread+0xb3/0xbd >> [ 1179.557650] [<c01386ac>] autoremove_wake_function+0x0/0x2d >> [ 1179.619719] [<c01385eb>] kthread+0x38/0x5d >> [ 1179.668961] [<c01385b3>] kthread+0x0/0x5d >> [ 1179.721684] [<c01094ff>] kernel_thread_helper+0x7/0x10 >> [ 1179.774313] ======================= >> [ 1179.830151] INFO: task drbd0_worker:2390 blocked for more than 120 seconds. >> [ 1179.886847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [ 1179.940232] drbd0_worker D 45fabce3 0 2390 2 >> [ 1179.993743] f7550560 00000046 f7550588 45fabce3 000000d4 f75506ec c201e020 00000001 >> [ 1179.996864] 00000002 c0122256 00000001 00000000 f747bfbc f74261c4 00000001 00000000 >> [ 1180.047966] 7fffffff 7fffffff f5007f0c 00000002 c02c8d88 00000000 00000003 f74261cc >> [ 1180.110950] Call Trace: >> [ 1180.236173] [<c0122256>] try_to_wake_up+0xe8/0xf1 >> [ 1180.290776] [<c02c8d88>] schedule_timeout+0x13/0x86 >> [ 1180.345188] [<c0121260>] __wake_up+0x29/0x39 >> [ 1180.402924] [<c02c84ad>] wait_for_common+0xaf/0x10f >> [ 1180.461106] [<c012225f>] default_wake_function+0x0/0x8 >> [ 1180.515185] [<c013593f>] call_usermodehelper_exec+0x60/0xa0 >> [ 1180.573920] [<f9098886>] drbd_khelper+0x80/0xca [drbd] >> [ 1180.636815] [<f907f7d1>] drbd_resync_finished+0x4b2/0x4fe [drbd] >> [ 1180.691807] [<f908d68f>] w_update_odbm+0x187/0x1c7 [drbd] >> [ 1180.746598] [<f907fdac>] drbd_worker+0x219/0x35c [drbd] >> [ 1180.804784] [<f90957b5>] drbd_thread_setup+0xf4/0x166 [drbd] >> [ 1180.858547] [<f90956c1>] drbd_thread_setup+0x0/0x166 [drbd] >> [ 1180.915507] [<c01094ff>] kernel_thread_helper+0x7/0x10 >> [ 1180.972635] ======================= > -- --------------------------------------------------------------------------- Stefano Cailotto - Research Consultant --------------------------------------------------------------------------- EDALab - Networked Embedded Systems c/o Dipartimento di Informatica - Universita` di Verona Strada le Grazie, 15 - 37134 Verona (Italy) --------------------------------------------------------------------------- email: stefano.cailotto at edalab.it stefano.cailotto at univr.it web: http://www.edalab.it tel/fax: +39-045-802-7062 ---------------------------------------------------------------------------