Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Mon, Sep 15, 2008 at 05:55:19PM +0200, Stefano Cailotto wrote: > Hi all, > when performing synchronization on a primary/primary drbd (8.2.7rc1), (in > which one of them has to be demoted to secondary to have it working), > sometimes I obtain a kernel module crash: here follows the output of dmesg to be precise: that is NOT a kernel module crash. it is "just" some sort of (distributed?) deadlock in drbd_worker. my guess would be that you could get it to recover by forcefully disconnecting it (plug the cable and wait for 10 seconds) or use iptables or "cutter" to reset that connection. what would be interessting though is, what exactly is "sometimes"? so, what is it you are doing there? anything "special"? > [ 917.704708] drbd0: role( Secondary -> Primary ) > [ 1047.973700] INFO: task cqueue:2379 blocked for more than 120 seconds. > [ 1048.039965] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 1048.102466] cqueue D 56b8dc67 0 2379 2 > [ 1048.164511] f5080e60 00000046 00000001 56b8dc67 000000d4 > f5080fec c201e020 00000001 > [ 1048.168357] f7883440 00000040 00000001 00000000 00000001 > c201e020 00000009 00000001 > [ 1048.234998] 7fffffff 7fffffff f0921e9c 00000002 c02c8d88 > 00000046 00000000 00000206 > [ 1048.302177] Call Trace: > [ 1048.443239] [<c02c8d88>] schedule_timeout+0x13/0x86 > [ 1048.510006] [<c012d421>] irq_exit+0x50/0x67 > [ 1048.572493] [<c01152c8>] smp_apic_timer_interrupt+0x6b/0x77 > [ 1048.638911] [<c0109370>] apic_timer_interrupt+0x28/0x30 > [ 1048.705188] [<c02c84ad>] wait_for_common+0xaf/0x10f > [ 1048.767240] [<c012225f>] default_wake_function+0x0/0x8 > [ 1048.829130] [<f90941c8>] drbd_req_state+0x2d5/0x2f1 [drbd] > [ 1048.894741] [<c01386ac>] autoremove_wake_function+0x0/0x2d > [ 1048.960639] [<f9094205>] _drbd_request_state+0x21/0xa0 [drbd] > [ 1049.018127] [<c0131e49>] specific_send_sig_info+0x7/0x9 > [ 1049.088010] [<c0132531>] force_sig_info+0x8b/0x93 > [ 1049.149265] [<f9098b44>] drbd_set_role+0x6b/0x490 [drbd] > [ 1049.210607] [<c0123f93>] hrtick_set+0x7a/0xd8 > [ 1049.279376] [<f9099381>] drbd_nl_primary+0x3b/0x44 [drbd] > [ 1049.337148] [<f909aa21>] drbd_connector_callback+0xbc/0x144 [drbd] > [ 1049.407308] [<f8e6c0ad>] cn_queue_wrapper+0x0/0x24 [cn] > [ 1049.465680] [<f8e6c0ba>] cn_queue_wrapper+0xd/0x24 [cn] > [ 1049.535753] [<c0135d32>] run_workqueue+0x74/0xf2 > [ 1049.593686] [<c013640d>] worker_thread+0x0/0xbd > [ 1049.659631] [<c01364c0>] worker_thread+0xb3/0xbd > [ 1049.725190] [<c01386ac>] autoremove_wake_function+0x0/0x2d > [ 1049.790488] [<c01385eb>] kthread+0x38/0x5d > [ 1049.851611] [<c01385b3>] kthread+0x0/0x5d > [ 1049.912392] [<c01094ff>] kernel_thread_helper+0x7/0x10 > [ 1049.969041] ======================= > [ 1050.033384] INFO: task drbd0_worker:2390 blocked for more than 120 seconds. > [ 1050.098415] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 1050.159859] drbd0_worker D 45fabce3 0 2390 2 > [ 1050.221208] f7550560 00000046 f7550588 45fabce3 000000d4 f75506ec c201e020 00000001 > [ 1050.228506] 00000002 c0122256 00000001 00000000 f747bfbc f74261c4 00000001 00000000 > [ 1050.290777] 7fffffff 7fffffff f5007f0c 00000002 c02c8d88 00000000 00000003 f74261cc > [ 1050.352410] Call Trace: > [ 1050.489098] [<c0122256>] try_to_wake_up+0xe8/0xf1 > [ 1050.546405] [<c02c8d88>] schedule_timeout+0x13/0x86 > [ 1050.607334] [<c0121260>] __wake_up+0x29/0x39 > [ 1050.670957] [<c02c84ad>] wait_for_common+0xaf/0x10f > [ 1050.733919] [<c012225f>] default_wake_function+0x0/0x8 > [ 1050.793182] [<c013593f>] call_usermodehelper_exec+0x60/0xa0 > [ 1050.852190] [<f9098886>] drbd_khelper+0x80/0xca [drbd] > [ 1050.915276] [<f907f7d1>] drbd_resync_finished+0x4b2/0x4fe [drbd] > [ 1050.974268] [<f908d68f>] w_update_odbm+0x187/0x1c7 [drbd] > [ 1051.005103] [<f907fdac>] drbd_worker+0x219/0x35c [drbd] > [ 1051.064581] [<f90957b5>] drbd_thread_setup+0xf4/0x166 [drbd] > [ 1051.126374] [<f90956c1>] drbd_thread_setup+0x0/0x166 [drbd] > [ 1051.179398] [<c01094ff>] kernel_thread_helper+0x7/0x10 > [ 1051.231824] ======================= > [ 1091.996642] INFO: task o2hb-04079A4A9B:3339 blocked for more than 120 seconds. > [ 1092.049983] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 1092.107826] o2hb-04079A4A D f04dc3f8 0 3339 2 > [ 1092.161765] f0895200 00000046 c0130887 f04dc3f8 f04dc3c0 f089538c c201e020 00000001 > [ 1092.161765] 00000000 00027846 00000000 f7945b50 00000000 00000000 00000000 ffffffff > [ 1092.225434] 7fffffff 7fffffff f3999eec 00000002 c02c8d88 f79ea0b0 f7945c9c f7945b50 > [ 1092.284844] Call Trace: > [ 1092.407099] [<c0130887>] lock_timer_base+0x19/0x35 > [ 1092.470599] [<c02c8d88>] schedule_timeout+0x13/0x86 > [ 1092.525835] [<c01dcf37>] __generic_unplug_device+0x1a/0x1c > [ 1092.589073] [<c01dd311>] generic_unplug_device+0x1e/0x33 > [ 1092.589077] [<f9095a0a>] drbd_unplug_fn+0x170/0x1c2 [drbd] > [ 1092.589094] [<c02c84ad>] wait_for_common+0xaf/0x10f > [ 1092.589097] [<c012225f>] default_wake_function+0x0/0x8 > [ 1092.589103] [<f90ca09b>] o2hb_do_disk_heartbeat+0x8c0/0x9e3 [ocfs2_nodemanager] > [ 1092.589120] [<f90ca4f1>] o2hb_thread+0x8a/0x3da [ocfs2_nodemanager] > [ 1092.589125] [<c02c8bb5>] schedule+0x63b/0x66d > [ 1092.589125] [<f90ca467>] o2hb_thread+0x0/0x3da [ocfs2_nodemanager] > [ 1092.589125] [<c01385eb>] kthread+0x38/0x5d > [ 1092.589125] [<c01385b3>] kthread+0x0/0x5d > [ 1092.589125] [<c01094ff>] kernel_thread_helper+0x7/0x10 > [ 1092.589125] ======================= > [ 1177.897228] INFO: task cqueue:2379 blocked for more than 120 seconds. > [ 1177.961703] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 1178.026661] cqueue D 56b8dc67 0 2379 2 > [ 1178.084513] f5080e60 00000046 00000001 56b8dc67 000000d4 f5080fec c201e020 00000001 > [ 1178.084513] f7883440 00000040 00000001 00000000 00000001 c201e020 00000009 00000001 > [ 1178.159079] 7fffffff 7fffffff f0921e9c 00000002 c02c8d88 00000046 00000000 00000206 > [ 1178.221488] Call Trace: > [ 1178.355940] [<c02c8d88>] schedule_timeout+0x13/0x86 > [ 1178.416623] [<c012d421>] irq_exit+0x50/0x67 > [ 1178.477058] [<c01152c8>] smp_apic_timer_interrupt+0x6b/0x77 > [ 1178.541613] [<c0109370>] apic_timer_interrupt+0x28/0x30 > [ 1178.601535] [<c02c84ad>] wait_for_common+0xaf/0x10f > [ 1178.657380] [<c012225f>] default_wake_function+0x0/0x8 > [ 1178.713295] [<f90941c8>] drbd_req_state+0x2d5/0x2f1 [drbd] > [ 1178.773435] [<c01386ac>] autoremove_wake_function+0x0/0x2d > [ 1178.837602] [<f9094205>] _drbd_request_state+0x21/0xa0 [drbd] > [ 1178.894013] [<c0131e49>] specific_send_sig_info+0x7/0x9 > [ 1178.954391] [<c0132531>] force_sig_info+0x8b/0x93 > [ 1179.018440] [<f9098b44>] drbd_set_role+0x6b/0x490 [drbd] > [ 1179.082012] [<c0123f93>] hrtick_set+0x7a/0xd8 > [ 1179.144031] [<f9099381>] drbd_nl_primary+0x3b/0x44 [drbd] > [ 1179.199704] [<f909aa21>] drbd_connector_callback+0xbc/0x144 [drbd] > [ 1179.263517] [<f8e6c0ad>] cn_queue_wrapper+0x0/0x24 [cn] > [ 1179.328158] [<f8e6c0ba>] cn_queue_wrapper+0xd/0x24 [cn] > [ 1179.384190] [<c0135d32>] run_workqueue+0x74/0xf2 > [ 1179.448085] [<c013640d>] worker_thread+0x0/0xbd > [ 1179.503328] [<c01364c0>] worker_thread+0xb3/0xbd > [ 1179.557650] [<c01386ac>] autoremove_wake_function+0x0/0x2d > [ 1179.619719] [<c01385eb>] kthread+0x38/0x5d > [ 1179.668961] [<c01385b3>] kthread+0x0/0x5d > [ 1179.721684] [<c01094ff>] kernel_thread_helper+0x7/0x10 > [ 1179.774313] ======================= > [ 1179.830151] INFO: task drbd0_worker:2390 blocked for more than 120 seconds. > [ 1179.886847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 1179.940232] drbd0_worker D 45fabce3 0 2390 2 > [ 1179.993743] f7550560 00000046 f7550588 45fabce3 000000d4 f75506ec c201e020 00000001 > [ 1179.996864] 00000002 c0122256 00000001 00000000 f747bfbc f74261c4 00000001 00000000 > [ 1180.047966] 7fffffff 7fffffff f5007f0c 00000002 c02c8d88 00000000 00000003 f74261cc > [ 1180.110950] Call Trace: > [ 1180.236173] [<c0122256>] try_to_wake_up+0xe8/0xf1 > [ 1180.290776] [<c02c8d88>] schedule_timeout+0x13/0x86 > [ 1180.345188] [<c0121260>] __wake_up+0x29/0x39 > [ 1180.402924] [<c02c84ad>] wait_for_common+0xaf/0x10f > [ 1180.461106] [<c012225f>] default_wake_function+0x0/0x8 > [ 1180.515185] [<c013593f>] call_usermodehelper_exec+0x60/0xa0 > [ 1180.573920] [<f9098886>] drbd_khelper+0x80/0xca [drbd] > [ 1180.636815] [<f907f7d1>] drbd_resync_finished+0x4b2/0x4fe [drbd] > [ 1180.691807] [<f908d68f>] w_update_odbm+0x187/0x1c7 [drbd] > [ 1180.746598] [<f907fdac>] drbd_worker+0x219/0x35c [drbd] > [ 1180.804784] [<f90957b5>] drbd_thread_setup+0xf4/0x166 [drbd] > [ 1180.858547] [<f90956c1>] drbd_thread_setup+0x0/0x166 [drbd] > [ 1180.915507] [<c01094ff>] kernel_thread_helper+0x7/0x10 > [ 1180.972635] ======================= -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed