Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I'm already from the last kernel from centos 5.2 repos, and I don't have drbd 8.0.14 available and I have many 8.0.13 working without problems after all. I bet this on the kernel, so I will today update the kernel to the new one from redhat that will in 5.3. It's still in testing but I will take a chance on it. Thanks. Thomas Reinhold wrote: > Hi Igor, > > I was asking about the RAID hardware because I had similar (or at > least similar looking) problems with DRBD 8.0.13, Kernel 2.6.18-6 > (Debian Etch) and CPU soft lockups under high disk loads: > (http://www.gossamer-threads.com/lists/drbd/users/16399?search_string=flush;#16399) > <http://www.gossamer-threads.com/lists/drbd/users/16399?search_string=flush;#16399%29> > > Unfortunately, I did not have the time to do exhaustive testing (i.e. > isolate the problem, reproduce without DRBD, change specific software > versions, etc.) as I was under some pressure to go productive. What I > did was trying to eliminate as many potential causes of the problems > as possible, which meant upgrading to a newer kernel version and > backporting the RAID controller driver, as well as upgrading to DRBD > 8.0.14. > > I have been doing some pretty extensive stress testing after that, and > the system is running fine without any problems since then. My > strongest bet concerning the cause of the problem is still the used > RAID controller (LSI 1078), as driver versions in kernel 2.6.18 are > known to cause problems under high load. But of course, I can't be > sure. And as I'm now reading that you have similar problems even > though you are using a different raid card, I'm wondering if there > could be any problems with DRBD and that specific kernel version? Is > it possible for you do upgrade the kernel and do some testing? > > Best regards, > > Thomas > > > Am 27.01.2009 um 17:47 schrieb Igor Neves: > >> Hi, >> >> Thomas Reinhold wrote: >>> Hi, >>> >>> what kind of RAID hardware are you using? >> >> I'm using Areca's controller. But I'm pretty shore it's not Areca, >> because we have some more like this in production. >> >>> >>> Regards, >>> >>> Thomas >>> >>> Am 26.01.2009 um 18:18 schrieb Igor Neves: >>> >>>> Hi, >>>> >>>> Thanks for your help. >>>> >>>> Lars Ellenberg wrote: >>>>> On Mon, Jan 26, 2009 at 03:12:02PM +0000, Igor Neves wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm having hard problems with one machine with drbd, ans having this >>>>>> kernel panic's >>>>>> >>>>> these are NOT kernel panics... >>>>> even though the may look very similar. >>>>> >>>> >>>> Yes, you are right, but they kill my machine like a kernel panic! :) >>>> >>>>> >>>>>> on the machine I run the vmware server. >>>>>> >>>>> thanks for mentioning this. >>>> >>>>> Does anyone know what this can be? >>>>> >>>>> I'm using drbd 8.0.13 on kernel 2.6.18-92.1.22.el5PAE, this is a Centos >>>>> 5.2 machine. >>>>> >>>>> Not a DRBD problem. >>>>> Appears to be a problem in the redhat kernel on vmware. >>>>> please have a look at >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=463573 >>>>> >>>> >>>> Thanks for point me out this bug, but I think we are speaking of >>>> different things. This bugs mention vmware machine as guest, this >>>> does not happen on the guest but on the host. Guest it's one >>>> windows machine. >>>> >>>> One more point, I had this vmware working on other machine without >>>> problems. Can this be interrupts? >>>> >>>> Here is the interrupts table: >>>> >>>> CPU0 CPU1 CPU2 CPU3 >>>> 0: 85216538 85180125 85220346 85160716 IO-APIC-edge timer >>>> 1: 8 1 1 0 IO-APIC-edge i8042 >>>> 4: 32854 32895 32997 32828 IO-APIC-edge >>>> serial >>>> 7: 1 1 0 0 IO-APIC-edge >>>> parport0 >>>> 8: 0 1 0 0 IO-APIC-edge rtc >>>> 9: 0 0 1 0 IO-APIC-level acpi >>>> 50: 0 0 0 0 PCI-MSI ahci >>>> 58: 1017131 1001386 1008608 1007834 IO-APIC-level skge >>>> 66: 2995867 2969551 2982197 2975044 IO-APIC-level eth3 >>>> 74: 1431195 1496518 1426694 1506294 IO-APIC-level eth4 >>>> 82: 35769244 0 0 0 PCI-MSI eth0 >>>> 90: 34243658 0 0 0 PCI-MSI eth1 >>>> 233: 2817474 2829933 2839738 2827824 IO-APIC-level >>>> arcmsr >>>> NMI: 0 0 0 0 >>>> LOC: 337373327 337372791 336681613 336681080 >>>> ERR: 0 >>>> MIS: 0 >>>> >>>> I wonder if skge and r8169 drivers are making problems with >>>> interrupts and drbd don't like it or even arcmsr that it's the >>>> areca controller storage driver. >>>> >>>>> >>>>>> Thanks >>>>>> >>>>>> ======================= >>>>>> BUG: soft lockup - CPU#0 stuck for 10s! [swapper:0] >>>>>> >>>>>> Pid: 0, comm: swapper >>>>>> EIP: 0060:[<c0608e1b>] CPU: 0 >>>>>> EIP is at _spin_lock_irqsave+0x13/0x27 >>>>>> EFLAGS: 00000286 Tainted: GF (2.6.18-92.1.22.el5PAE #1) >>>>>> EAX: f79c4028 EBX: f79c4000 ECX: c072afdc EDX: 00000246 >>>>>> ESI: f7c4ac00 EDI: f7c4ac00 EBP: 00000000 DS: 007b ES: 007b >>>>>> CR0: 8005003b CR2: 9fb19000 CR3: 00724000 CR4: 000006f0 >>>>>> [<f887f8d1>] scsi_device_unbusy+0xf/0x69 [scsi_mod] >>>>>> [<f887b356>] scsi_finish_command+0x10/0x77 [scsi_mod] >>>>>> [<c04d8a34>] blk_done_softirq+0x4d/0x58 >>>>>> [<c042ab5a>] __do_softirq+0x5a/0xbb >>>>>> [<c0407451>] do_softirq+0x52/0x9d >>>>>> [<c04073f6>] do_IRQ+0xa5/0xae >>>>>> [<c040592e>] common_interrupt+0x1a/0x20 >>>>>> [<c0403ccf>] mwait_idle+0x25/0x38 >>>>>> [<c0403c90>] cpu_idle+0x9f/0xb9 >>>>>> [<c06eb9ee>] start_kernel+0x379/0x380 >>>>>> ======================= >>>>>> BUG: soft lockup - CPU#1 stuck for 10s! [drbd0_receiver:4880] >>>>>> >>>>>> Pid: 4880, comm: drbd0_receiver >>>>>> EIP: 0060:[<c0608e1b>] CPU: 1 >>>>>> EIP is at _spin_lock_irqsave+0x13/0x27 >>>>>> EFLAGS: 00000286 Tainted: GF (2.6.18-92.1.22.el5PAE #1) >>>>>> EAX: f79c4028 EBX: ca5ff6c0 ECX: f7c4ac00 EDX: 00000202 >>>>>> ESI: f79c4000 EDI: f7c4ac94 EBP: 00000001 DS: 007b ES: 007b >>>>>> CR0: 8005003b CR2: 7ff72000 CR3: 00724000 CR4: 000006f0 >>>>>> [<f887edef>] scsi_run_queue+0xcd/0x189 [scsi_mod] >>>>>> [<f887f46e>] scsi_next_command+0x25/0x2f [scsi_mod] >>>>>> [<f887f583>] scsi_end_request+0x9f/0xa9 [scsi_mod] >>>>>> [<f887f6cd>] scsi_io_completion+0x140/0x2ea [scsi_mod] >>>>>> [<f885a3d2>] sd_rw_intr+0x1f1/0x21b [sd_mod] >>>>>> [<f887b3b9>] scsi_finish_command+0x73/0x77 [scsi_mod] >>>>>> [<c04d8a34>] blk_done_softirq+0x4d/0x58 >>>>>> [<c042ab5a>] __do_softirq+0x5a/0xbb >>>>>> [<c0407451>] do_softirq+0x52/0x9d >>>>>> [<c042a961>] local_bh_enable+0x74/0x7f >>>>>> [<c05d1407>] tcp_prequeue_process+0x5a/0x66 >>>>>> [<c05d3692>] tcp_recvmsg+0x416/0x9f7 >>>>>> [<c05a725e>] sock_common_recvmsg+0x2f/0x45 >>>>>> [<c05a5017>] sock_recvmsg+0xe5/0x100 >>>>>> [<c0436347>] autoremove_wake_function+0x0/0x2d >>>>>> [<c05a6c97>] kernel_sendmsg+0x27/0x35 >>>>>> [<f8d87718>] drbd_send+0x77/0x13f [drbd] >>>>>> [<f8d78485>] drbd_recv+0x57/0xd7 [drbd] >>>>>> [<f8d78485>] drbd_recv+0x57/0xd7 [drbd] >>>>>> [<f8d78694>] drbd_recv_header+0x10/0x94 [drbd] >>>>>> [<f8d78c4b>] drbdd+0x18/0x12b [drbd] >>>>>> [<f8d7b586>] drbdd_init+0xa0/0x173 [drbd] >>>>>> [<f8d89e47>] drbd_thread_setup+0xbb/0x150 [drbd] >>>>>> [<f8d89d8c>] drbd_thread_setup+0x0/0x150 [drbd] >>>>>> [<c0405c3b>] kernel_thread_helper+0x7/0x10 >>>>>> >>>>> >>>> _______________________________________________ >>>> drbd-user mailing list >>>> drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com> >>>> http://lists.linbit.com/mailman/listinfo/drbd-user >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> drbd-user mailing list >>> drbd-user at lists.linbit.com >>> http://lists.linbit.com/mailman/listinfo/drbd-user >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090202/041a92e8/attachment.htm>