Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Any luck with the traces I sent? On Wed, Feb 27, 2013 at 1:50 PM, Jesus Climent <climent at gmail.com> wrote: > I have been digging a bit more in the current state and these are my findings: > > Once a migration is right about to be finished, the secondary node > received the signal of becoming primary and it is when drbdsetup is > trying to set the secondary, it fails in the primary > > This is the drbdsetup process in d-state, under hostname6 > root 27691 0.0 0.0 3964 536 ? D 00:06 0:00 > drbdsetup /dev/drbd1 secondary > > hostname6 1 inst-test3.google.com disk/0 secondary hostname5 > hostname5 0 inst-test3.google.com disk/0 primary hostname6 > > node: hostname5 > 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r---b- > > node: hostname6 > 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r---b- > > [xen-test] root at hostname5:~# drbdsetup /dev/drbd0 status > <resource minor="0" cs="Connected" ro1="Primary" ro2="Primary" > ds1="UpToDate" ds2="UpToDate" /> > > For hostname6's trace: > > http://db.tt/FI8rmrpw > > And hostname5's relevant pieces. If more are needed, i can post them. > > Feb 27 16:08:43 hostname5 kernel: [64271.712343] drbd0_receiver S > ffff88003e411dc0 0 19682 2 0x00000000 > Feb 27 16:08:43 hostname5 kernel: [64271.712351] ffff88002b205970 > 0000000000000246 ffff88002b2058f0 ffff880016daea00 > Feb 27 16:08:43 hostname5 kernel: [64271.712360] 000000000000000c > 0000000000000000 ffff8800020fd160 0000000000011dc0 > Feb 27 16:08:43 hostname5 kernel: [64271.712369] ffff88002b205fd8 > ffff88002b204010 ffff88002b205fd8 0000000000011dc0 > Feb 27 16:08:43 hostname5 kernel: [64271.712377] Call Trace: > Feb 27 16:08:43 hostname5 kernel: [64271.712382] [<ffffffff810d1057>] > ? kfree+0x17/0xc0 > Feb 27 16:08:43 hostname5 kernel: [64271.712388] [<ffffffff8149878a>] > schedule+0x3a/0x60 > Feb 27 16:08:43 hostname5 kernel: [64271.712393] [<ffffffff81498b95>] > schedule_timeout+0x185/0x1e0 > Feb 27 16:08:43 hostname5 kernel: [64271.712400] [<ffffffff8104c3e2>] > ? local_bh_enable_ip+0x22/0xa0 > Feb 27 16:08:43 hostname5 kernel: [64271.712406] [<ffffffff8149a444>] > ? _raw_spin_unlock_bh+0x14/0x20 > Feb 27 16:08:43 hostname5 kernel: [64271.712413] [<ffffffff813b63f1>] > sk_wait_data+0xd1/0xe0 > Feb 27 16:08:43 hostname5 kernel: [64271.712419] [<ffffffff810622b0>] > ? wake_up_bit+0x40/0x40 > Feb 27 16:08:43 hostname5 kernel: [64271.712425] [<ffffffff8104be62>] > ? local_bh_enable+0x22/0xa0 > Feb 27 16:08:43 hostname5 kernel: [64271.712431] [<ffffffff81403d01>] > tcp_recvmsg+0x651/0xc80 > Feb 27 16:08:43 hostname5 kernel: [64271.712437] [<ffffffff81424f7a>] > inet_recvmsg+0x4a/0x80 > Feb 27 16:08:43 hostname5 kernel: [64271.712444] [<ffffffff81005485>] > ? arbitrary_virt_to_machine+0x85/0xb0 > Feb 27 16:08:43 hostname5 kernel: [64271.712450] [<ffffffff813b1961>] > sock_recvmsg+0xc1/0xf0 > Feb 27 16:08:43 hostname5 kernel: [64271.712456] [<ffffffff81003129>] > ? xen_end_context_switch+0x19/0x20 > Feb 27 16:08:43 hostname5 kernel: [64271.712462] [<ffffffff81009915>] > ? __switch_to+0x145/0x370 > Feb 27 16:08:43 hostname5 kernel: [64271.712468] [<ffffffff8103e75e>] > ? finish_task_switch+0x5e/0xc0 > Feb 27 16:08:43 hostname5 kernel: [64271.712475] [<ffffffff814981ea>] > ? __schedule+0x29a/0x760 > Feb 27 16:08:43 hostname5 kernel: [64271.712481] [<ffffffff8149a279>] > ? _raw_spin_unlock_irqrestore+0x19/0x20 > Feb 27 16:08:43 hostname5 kernel: [64271.712493] [<ffffffffa00569ec>] > drbd_recv+0x8c/0x230 [drbd] > Feb 27 16:08:43 hostname5 kernel: [64271.712505] [<ffffffffa0059bac>] > ? drbd_may_finish_epoch+0x9c/0x3a0 [drbd] > Feb 27 16:08:43 hostname5 kernel: [64271.712517] [<ffffffffa0057a0e>] > drbd_recv_header+0x2e/0x130 [drbd] > Feb 27 16:08:43 hostname5 kernel: [64271.712528] [<ffffffffa00583e6>] > drbdd+0x46/0x200 [drbd] > Feb 27 16:08:43 hostname5 kernel: [64271.712540] [<ffffffffa005e2e5>] > drbdd_init+0x85/0x130 [drbd] > Feb 27 16:08:43 hostname5 kernel: [64271.712551] [<ffffffffa006a174>] > drbd_thread_setup+0x64/0xf0 [drbd] > Feb 27 16:08:43 hostname5 kernel: [64271.712563] [<ffffffffa006a110>] > ? _drbd_thread_stop+0x100/0x100 [drbd] > Feb 27 16:08:43 hostname5 kernel: [64271.712569] [<ffffffff81061e06>] > kthread+0x96/0xa0 > Feb 27 16:08:43 hostname5 kernel: [64271.712576] [<ffffffff8149ceb4>] > kernel_thread_helper+0x4/0x10 > Feb 27 16:08:43 hostname5 kernel: [64271.712582] [<ffffffff8149af76>] > ? int_ret_from_sys_call+0x7/0x1b > Feb 27 16:08:43 hostname5 kernel: [64271.712589] [<ffffffff8149a6bc>] > ? retint_restore_args+0x5/0x6 > Feb 27 16:08:43 hostname5 kernel: [64271.712595] [<ffffffff8149ceb0>] > ? gs_change+0x13/0x13 > Feb 27 16:08:43 hostname5 kernel: [64271.712599] drbd0_asender S > ffff88003e411dc0 0 19689 2 0x00000000 > Feb 27 16:08:43 hostname5 kernel: [64271.712607] ffff8800249739a0 > 0000000000000246 ffff880024973920 ffffffff810cd8c8 > Feb 27 16:08:43 hostname5 kernel: [64271.712616] 000000000000a570 > ffff880016ea94b0 ffff8800020f8000 0000000000011dc0 > Feb 27 16:08:43 hostname5 kernel: [64271.712625] ffff880024973fd8 > ffff880024972010 ffff880024973fd8 0000000000011dc0 > Feb 27 16:08:43 hostname5 kernel: [64271.712633] Call Trace: > Feb 27 16:08:43 hostname5 kernel: [64271.712637] [<ffffffff810cd8c8>] > ? ksize+0x18/0xc0 > Feb 27 16:08:43 hostname5 kernel: [64271.712665] [<ffffffff8149a22a>] > ? _raw_spin_lock_irqsave+0x2a/0x40 > Feb 27 16:08:43 hostname5 kernel: [64271.712671] [<ffffffff8149878a>] > schedule+0x3a/0x60 > Feb 27 16:08:43 hostname5 kernel: [64271.712677] [<ffffffff81498b45>] > schedule_timeout+0x135/0x1e0 > Feb 27 16:08:43 hostname5 kernel: [64271.712684] [<ffffffff81052540>] > ? add_timer_on+0xa0/0xa0 > Feb 27 16:08:43 hostname5 kernel: [64271.712690] [<ffffffff813b63f1>] > sk_wait_data+0xd1/0xe0 > Feb 27 16:08:43 hostname5 kernel: [64271.712696] [<ffffffff810622b0>] > ? wake_up_bit+0x40/0x40 > Feb 27 16:08:43 hostname5 kernel: [64271.712702] [<ffffffff8104be62>] > ? local_bh_enable+0x22/0xa0 > Feb 27 16:08:43 hostname5 kernel: [64271.712708] [<ffffffff81403d01>] > tcp_recvmsg+0x651/0xc80 > Feb 27 16:08:43 hostname5 kernel: [64271.712715] [<ffffffff810085d1>] > ? m2p_remove_override+0x251/0x2f0 > Feb 27 16:08:43 hostname5 kernel: [64271.712721] [<ffffffff81424f7a>] > inet_recvmsg+0x4a/0x80 > Feb 27 16:08:43 hostname5 kernel: [64271.712727] [<ffffffff813b1961>] > sock_recvmsg+0xc1/0xf0 > Feb 27 16:08:43 hostname5 kernel: [64271.712733] [<ffffffff810d0ac5>] > ? kmem_cache_free+0x15/0x90 > Feb 27 16:08:43 hostname5 kernel: [64271.712740] [<ffffffff81096442>] > ? mempool_free_slab+0x12/0x20 > Feb 27 16:08:43 hostname5 kernel: [64271.712747] [<ffffffff81096555>] > ? mempool_free+0x85/0x90 > Feb 27 16:08:43 hostname5 kernel: [64271.712758] [<ffffffffa005ff0a>] > ? _req_may_be_done+0x12a/0x4e0 [drbd] > Feb 27 16:08:43 hostname5 kernel: [64271.712765] [<ffffffff8103a02e>] > ? __wake_up+0x4e/0x70 > Feb 27 16:08:43 hostname5 kernel: [64271.712771] [<ffffffff8149a279>] > ? _raw_spin_unlock_irqrestore+0x19/0x20 > Feb 27 16:08:43 hostname5 kernel: [64271.712777] [<ffffffff8103a02e>] > ? __wake_up+0x4e/0x70 > Feb 27 16:08:43 hostname5 kernel: [64271.712788] [<ffffffffa00541b3>] > drbd_recv_short+0x73/0x90 [drbd] > Feb 27 16:08:43 hostname5 kernel: [64271.712800] [<ffffffffa0059569>] > drbd_asender+0x189/0x730 [drbd] > Feb 27 16:08:43 hostname5 kernel: [64271.712806] [<ffffffff8103e75e>] > ? finish_task_switch+0x5e/0xc0 > Feb 27 16:08:43 hostname5 kernel: [64271.712818] [<ffffffffa006a174>] > drbd_thread_setup+0x64/0xf0 [drbd] > Feb 27 16:08:43 hostname5 kernel: [64271.712829] [<ffffffffa006a110>] > ? _drbd_thread_stop+0x100/0x100 [drbd] > Feb 27 16:08:43 hostname5 kernel: [64271.712836] [<ffffffff81061e06>] > kthread+0x96/0xa0 > Feb 27 16:08:43 hostname5 kernel: [64271.712842] [<ffffffff8149ceb4>] > kernel_thread_helper+0x4/0x10 > Feb 27 16:08:43 hostname5 kernel: [64271.712849] [<ffffffff8149af76>] > ? int_ret_from_sys_call+0x7/0x1b > Feb 27 16:08:43 hostname5 kernel: [64271.712855] [<ffffffff8149a6bc>] > ? retint_restore_args+0x5/0x6 > Feb 27 16:08:43 hostname5 kernel: [64271.712861] [<ffffffff8149ceb0>] > ? gs_change+0x13/0x13 > > > -- > climent () gmail ! com > > On Wed, Feb 27, 2013 at 9:25 AM, Jesus Climent <climent at gmail.com> wrote: >> I have managed to create a repro case, when the system is under a high >> load of I/O. From a set of 4 test clusters, all except one got a >> "drbdsetup /dev/drbdX secondary" hang. >> >> What other information should I send to the list in order to evaluate >> this problem? >> >> On Wed, Feb 27, 2013 at 7:45 AM, Lars Ellenberg >> <lars.ellenberg at linbit.com> wrote: >>> On Tue, Feb 26, 2013 at 06:01:04PM -0500, Jesus Climent wrote: >>>> Has anybody taken a look into it? >>> >>> Yep. >>> Nothing obvious, sorry. >>> >>> Lars >>> _______________________________________________ >>> drbd-user mailing list >>> drbd-user at lists.linbit.com >>> http://lists.linbit.com/mailman/listinfo/drbd-user >> >> >> >> -- >> climent () gmail ! com > > > > -- > climent () gmail ! com -- climent () gmail ! com