Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Friendly ping On Fri, Mar 1, 2013 at 2:42 PM, Jesus Climent <climent at gmail.com> wrote: > Any luck with the traces I sent? > > On Wed, Feb 27, 2013 at 1:50 PM, Jesus Climent <climent at gmail.com> wrote: >> I have been digging a bit more in the current state and these are my findings: >> >> Once a migration is right about to be finished, the secondary node >> received the signal of becoming primary and it is when drbdsetup is >> trying to set the secondary, it fails in the primary >> >> This is the drbdsetup process in d-state, under hostname6 >> root 27691 0.0 0.0 3964 536 ? D 00:06 0:00 >> drbdsetup /dev/drbd1 secondary >> >> hostname6 1 inst-test3.google.com disk/0 secondary hostname5 >> hostname5 0 inst-test3.google.com disk/0 primary hostname6 >> >> node: hostname5 >> 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r---b- >> >> node: hostname6 >> 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r---b- >> >> [xen-test] root at hostname5:~# drbdsetup /dev/drbd0 status >> <resource minor="0" cs="Connected" ro1="Primary" ro2="Primary" >> ds1="UpToDate" ds2="UpToDate" /> >> >> For hostname6's trace: >> >> http://db.tt/FI8rmrpw >> >> And hostname5's relevant pieces. If more are needed, i can post them. >> >> Feb 27 16:08:43 hostname5 kernel: [64271.712343] drbd0_receiver S >> ffff88003e411dc0 0 19682 2 0x00000000 >> Feb 27 16:08:43 hostname5 kernel: [64271.712351] ffff88002b205970 >> 0000000000000246 ffff88002b2058f0 ffff880016daea00 >> Feb 27 16:08:43 hostname5 kernel: [64271.712360] 000000000000000c >> 0000000000000000 ffff8800020fd160 0000000000011dc0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712369] ffff88002b205fd8 >> ffff88002b204010 ffff88002b205fd8 0000000000011dc0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712377] Call Trace: >> Feb 27 16:08:43 hostname5 kernel: [64271.712382] [<ffffffff810d1057>] >> ? kfree+0x17/0xc0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712388] [<ffffffff8149878a>] >> schedule+0x3a/0x60 >> Feb 27 16:08:43 hostname5 kernel: [64271.712393] [<ffffffff81498b95>] >> schedule_timeout+0x185/0x1e0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712400] [<ffffffff8104c3e2>] >> ? local_bh_enable_ip+0x22/0xa0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712406] [<ffffffff8149a444>] >> ? _raw_spin_unlock_bh+0x14/0x20 >> Feb 27 16:08:43 hostname5 kernel: [64271.712413] [<ffffffff813b63f1>] >> sk_wait_data+0xd1/0xe0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712419] [<ffffffff810622b0>] >> ? wake_up_bit+0x40/0x40 >> Feb 27 16:08:43 hostname5 kernel: [64271.712425] [<ffffffff8104be62>] >> ? local_bh_enable+0x22/0xa0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712431] [<ffffffff81403d01>] >> tcp_recvmsg+0x651/0xc80 >> Feb 27 16:08:43 hostname5 kernel: [64271.712437] [<ffffffff81424f7a>] >> inet_recvmsg+0x4a/0x80 >> Feb 27 16:08:43 hostname5 kernel: [64271.712444] [<ffffffff81005485>] >> ? arbitrary_virt_to_machine+0x85/0xb0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712450] [<ffffffff813b1961>] >> sock_recvmsg+0xc1/0xf0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712456] [<ffffffff81003129>] >> ? xen_end_context_switch+0x19/0x20 >> Feb 27 16:08:43 hostname5 kernel: [64271.712462] [<ffffffff81009915>] >> ? __switch_to+0x145/0x370 >> Feb 27 16:08:43 hostname5 kernel: [64271.712468] [<ffffffff8103e75e>] >> ? finish_task_switch+0x5e/0xc0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712475] [<ffffffff814981ea>] >> ? __schedule+0x29a/0x760 >> Feb 27 16:08:43 hostname5 kernel: [64271.712481] [<ffffffff8149a279>] >> ? _raw_spin_unlock_irqrestore+0x19/0x20 >> Feb 27 16:08:43 hostname5 kernel: [64271.712493] [<ffffffffa00569ec>] >> drbd_recv+0x8c/0x230 [drbd] >> Feb 27 16:08:43 hostname5 kernel: [64271.712505] [<ffffffffa0059bac>] >> ? drbd_may_finish_epoch+0x9c/0x3a0 [drbd] >> Feb 27 16:08:43 hostname5 kernel: [64271.712517] [<ffffffffa0057a0e>] >> drbd_recv_header+0x2e/0x130 [drbd] >> Feb 27 16:08:43 hostname5 kernel: [64271.712528] [<ffffffffa00583e6>] >> drbdd+0x46/0x200 [drbd] >> Feb 27 16:08:43 hostname5 kernel: [64271.712540] [<ffffffffa005e2e5>] >> drbdd_init+0x85/0x130 [drbd] >> Feb 27 16:08:43 hostname5 kernel: [64271.712551] [<ffffffffa006a174>] >> drbd_thread_setup+0x64/0xf0 [drbd] >> Feb 27 16:08:43 hostname5 kernel: [64271.712563] [<ffffffffa006a110>] >> ? _drbd_thread_stop+0x100/0x100 [drbd] >> Feb 27 16:08:43 hostname5 kernel: [64271.712569] [<ffffffff81061e06>] >> kthread+0x96/0xa0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712576] [<ffffffff8149ceb4>] >> kernel_thread_helper+0x4/0x10 >> Feb 27 16:08:43 hostname5 kernel: [64271.712582] [<ffffffff8149af76>] >> ? int_ret_from_sys_call+0x7/0x1b >> Feb 27 16:08:43 hostname5 kernel: [64271.712589] [<ffffffff8149a6bc>] >> ? retint_restore_args+0x5/0x6 >> Feb 27 16:08:43 hostname5 kernel: [64271.712595] [<ffffffff8149ceb0>] >> ? gs_change+0x13/0x13 >> Feb 27 16:08:43 hostname5 kernel: [64271.712599] drbd0_asender S >> ffff88003e411dc0 0 19689 2 0x00000000 >> Feb 27 16:08:43 hostname5 kernel: [64271.712607] ffff8800249739a0 >> 0000000000000246 ffff880024973920 ffffffff810cd8c8 >> Feb 27 16:08:43 hostname5 kernel: [64271.712616] 000000000000a570 >> ffff880016ea94b0 ffff8800020f8000 0000000000011dc0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712625] ffff880024973fd8 >> ffff880024972010 ffff880024973fd8 0000000000011dc0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712633] Call Trace: >> Feb 27 16:08:43 hostname5 kernel: [64271.712637] [<ffffffff810cd8c8>] >> ? ksize+0x18/0xc0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712665] [<ffffffff8149a22a>] >> ? _raw_spin_lock_irqsave+0x2a/0x40 >> Feb 27 16:08:43 hostname5 kernel: [64271.712671] [<ffffffff8149878a>] >> schedule+0x3a/0x60 >> Feb 27 16:08:43 hostname5 kernel: [64271.712677] [<ffffffff81498b45>] >> schedule_timeout+0x135/0x1e0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712684] [<ffffffff81052540>] >> ? add_timer_on+0xa0/0xa0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712690] [<ffffffff813b63f1>] >> sk_wait_data+0xd1/0xe0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712696] [<ffffffff810622b0>] >> ? wake_up_bit+0x40/0x40 >> Feb 27 16:08:43 hostname5 kernel: [64271.712702] [<ffffffff8104be62>] >> ? local_bh_enable+0x22/0xa0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712708] [<ffffffff81403d01>] >> tcp_recvmsg+0x651/0xc80 >> Feb 27 16:08:43 hostname5 kernel: [64271.712715] [<ffffffff810085d1>] >> ? m2p_remove_override+0x251/0x2f0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712721] [<ffffffff81424f7a>] >> inet_recvmsg+0x4a/0x80 >> Feb 27 16:08:43 hostname5 kernel: [64271.712727] [<ffffffff813b1961>] >> sock_recvmsg+0xc1/0xf0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712733] [<ffffffff810d0ac5>] >> ? kmem_cache_free+0x15/0x90 >> Feb 27 16:08:43 hostname5 kernel: [64271.712740] [<ffffffff81096442>] >> ? mempool_free_slab+0x12/0x20 >> Feb 27 16:08:43 hostname5 kernel: [64271.712747] [<ffffffff81096555>] >> ? mempool_free+0x85/0x90 >> Feb 27 16:08:43 hostname5 kernel: [64271.712758] [<ffffffffa005ff0a>] >> ? _req_may_be_done+0x12a/0x4e0 [drbd] >> Feb 27 16:08:43 hostname5 kernel: [64271.712765] [<ffffffff8103a02e>] >> ? __wake_up+0x4e/0x70 >> Feb 27 16:08:43 hostname5 kernel: [64271.712771] [<ffffffff8149a279>] >> ? _raw_spin_unlock_irqrestore+0x19/0x20 >> Feb 27 16:08:43 hostname5 kernel: [64271.712777] [<ffffffff8103a02e>] >> ? __wake_up+0x4e/0x70 >> Feb 27 16:08:43 hostname5 kernel: [64271.712788] [<ffffffffa00541b3>] >> drbd_recv_short+0x73/0x90 [drbd] >> Feb 27 16:08:43 hostname5 kernel: [64271.712800] [<ffffffffa0059569>] >> drbd_asender+0x189/0x730 [drbd] >> Feb 27 16:08:43 hostname5 kernel: [64271.712806] [<ffffffff8103e75e>] >> ? finish_task_switch+0x5e/0xc0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712818] [<ffffffffa006a174>] >> drbd_thread_setup+0x64/0xf0 [drbd] >> Feb 27 16:08:43 hostname5 kernel: [64271.712829] [<ffffffffa006a110>] >> ? _drbd_thread_stop+0x100/0x100 [drbd] >> Feb 27 16:08:43 hostname5 kernel: [64271.712836] [<ffffffff81061e06>] >> kthread+0x96/0xa0 >> Feb 27 16:08:43 hostname5 kernel: [64271.712842] [<ffffffff8149ceb4>] >> kernel_thread_helper+0x4/0x10 >> Feb 27 16:08:43 hostname5 kernel: [64271.712849] [<ffffffff8149af76>] >> ? int_ret_from_sys_call+0x7/0x1b >> Feb 27 16:08:43 hostname5 kernel: [64271.712855] [<ffffffff8149a6bc>] >> ? retint_restore_args+0x5/0x6 >> Feb 27 16:08:43 hostname5 kernel: [64271.712861] [<ffffffff8149ceb0>] >> ? gs_change+0x13/0x13 >> >> >> -- >> climent () gmail ! com >> >> On Wed, Feb 27, 2013 at 9:25 AM, Jesus Climent <climent at gmail.com> wrote: >>> I have managed to create a repro case, when the system is under a high >>> load of I/O. From a set of 4 test clusters, all except one got a >>> "drbdsetup /dev/drbdX secondary" hang. >>> >>> What other information should I send to the list in order to evaluate >>> this problem? >>> >>> On Wed, Feb 27, 2013 at 7:45 AM, Lars Ellenberg >>> <lars.ellenberg at linbit.com> wrote: >>>> On Tue, Feb 26, 2013 at 06:01:04PM -0500, Jesus Climent wrote: >>>>> Has anybody taken a look into it? >>>> >>>> Yep. >>>> Nothing obvious, sorry. >>>> >>>> Lars >>>> _______________________________________________ >>>> drbd-user mailing list >>>> drbd-user at lists.linbit.com >>>> http://lists.linbit.com/mailman/listinfo/drbd-user >>> >>> >>> >>> -- >>> climent () gmail ! com >> >> >> >> -- >> climent () gmail ! com > > > > -- > climent () gmail ! com -- climent () gmail ! com