[DRBD-user] Kernel Panic

Igor Neves igor at 3gnt.net
Mon Feb 2 11:12:05 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

I'm already from the last kernel from centos 5.2 repos, and I don't have
drbd 8.0.14 available and I have many 8.0.13 working without problems
after all.

I bet this on the kernel, so I will today update the kernel to the new
one from redhat that will in 5.3. It's still in testing but I will take
a chance on it.

Thanks.

Thomas Reinhold wrote:
> Hi Igor, 
>
> I was asking about the RAID hardware because I had similar (or at
> least similar looking) problems with DRBD 8.0.13, Kernel 2.6.18-6
> (Debian Etch) and CPU soft lockups under high disk loads:
> (http://www.gossamer-threads.com/lists/drbd/users/16399?search_string=flush;#16399)
> <http://www.gossamer-threads.com/lists/drbd/users/16399?search_string=flush;#16399%29>
>
> Unfortunately, I did not have the time to do exhaustive testing (i.e.
> isolate the problem, reproduce without DRBD, change specific software
> versions,  etc.) as I was under some pressure to go productive. What I
> did was trying to eliminate as many potential causes of the problems
> as possible, which meant upgrading to a newer kernel version and
> backporting the RAID controller driver, as well as upgrading to DRBD
> 8.0.14. 
>
> I have been doing some pretty extensive stress testing after that, and
> the system is running fine without any problems since then. My
> strongest bet concerning the cause of the problem is still the used
> RAID controller (LSI 1078), as driver versions in kernel 2.6.18 are
> known to cause problems under high load. But of course, I can't be
> sure. And as I'm now reading that you have similar problems even
> though you are using a different raid card, I'm wondering if there
> could be any problems with DRBD and that specific kernel version? Is
> it possible for you do upgrade the kernel and do some testing?
>
> Best regards, 
>
>   Thomas
>
>
> Am 27.01.2009 um 17:47 schrieb Igor Neves:
>
>> Hi,
>>
>> Thomas Reinhold wrote:
>>> Hi, 
>>>
>>> what kind of RAID hardware are you using?
>>
>> I'm using Areca's controller. But I'm pretty shore it's not Areca,
>> because we have some more like this in production.
>>
>>>
>>> Regards, 
>>>
>>>   Thomas
>>>
>>> Am 26.01.2009 um 18:18 schrieb Igor Neves:
>>>
>>>> Hi,
>>>>
>>>> Thanks for your help.
>>>>
>>>> Lars Ellenberg wrote:
>>>>> On Mon, Jan 26, 2009 at 03:12:02PM +0000, Igor Neves wrote:
>>>>>   
>>>>>> Hi,
>>>>>>
>>>>>> I'm having hard problems with one machine with drbd, ans having this
>>>>>> kernel panic's
>>>>>>     
>>>>> these are NOT kernel panics...
>>>>> even though the may look very similar.
>>>>>   
>>>>
>>>> Yes, you are right, but they kill my machine like a kernel panic! :)
>>>>
>>>>>   
>>>>>> on the machine I run the vmware server.
>>>>>>     
>>>>> thanks for mentioning this.
>>>>  
>>>>> Does anyone know what this can be?
>>>>>
>>>>> I'm using drbd 8.0.13 on kernel 2.6.18-92.1.22.el5PAE, this is a Centos
>>>>> 5.2 machine.
>>>>>   
>>>>> Not a DRBD problem.
>>>>> Appears to be a problem in the redhat kernel on vmware.
>>>>> please have a look at
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=463573
>>>>>   
>>>>
>>>> Thanks for point me out this bug, but I think we are speaking of
>>>> different things. This bugs mention vmware machine as guest, this
>>>> does not happen on the guest but on the host. Guest it's one
>>>> windows machine.
>>>>
>>>> One more point, I had this vmware working on other machine without
>>>> problems. Can this be interrupts?
>>>>
>>>> Here is the interrupts table:
>>>>
>>>>            CPU0       CPU1       CPU2       CPU3
>>>>   0:   85216538   85180125   85220346   85160716    IO-APIC-edge  timer
>>>>   1:          8          1          1          0    IO-APIC-edge  i8042
>>>>   4:      32854      32895      32997      32828    IO-APIC-edge 
>>>> serial
>>>>   7:          1          1          0          0    IO-APIC-edge 
>>>> parport0
>>>>   8:          0          1          0          0    IO-APIC-edge  rtc
>>>>   9:          0          0          1          0   IO-APIC-level  acpi
>>>>  50:          0          0          0          0         PCI-MSI  ahci
>>>>  58:    1017131    1001386    1008608    1007834   IO-APIC-level  skge
>>>>  66:    2995867    2969551    2982197    2975044   IO-APIC-level  eth3
>>>>  74:    1431195    1496518    1426694    1506294   IO-APIC-level  eth4
>>>>  82:   35769244          0          0          0         PCI-MSI  eth0
>>>>  90:   34243658          0          0          0         PCI-MSI  eth1
>>>> 233:    2817474    2829933    2839738    2827824   IO-APIC-level 
>>>> arcmsr
>>>> NMI:          0          0          0          0
>>>> LOC:  337373327  337372791  336681613  336681080
>>>> ERR:          0
>>>> MIS:          0
>>>>
>>>> I wonder if skge and r8169 drivers are making problems with
>>>> interrupts and drbd don't like it or even arcmsr that it's the
>>>> areca controller storage driver.
>>>>
>>>>>   
>>>>>> Thanks
>>>>>>
>>>>>>  =======================
>>>>>> BUG: soft lockup - CPU#0 stuck for 10s! [swapper:0]
>>>>>>
>>>>>> Pid: 0, comm:              swapper
>>>>>> EIP: 0060:[<c0608e1b>] CPU: 0
>>>>>> EIP is at _spin_lock_irqsave+0x13/0x27
>>>>>>  EFLAGS: 00000286    Tainted: GF      (2.6.18-92.1.22.el5PAE #1)
>>>>>> EAX: f79c4028 EBX: f79c4000 ECX: c072afdc EDX: 00000246
>>>>>> ESI: f7c4ac00 EDI: f7c4ac00 EBP: 00000000 DS: 007b ES: 007b
>>>>>> CR0: 8005003b CR2: 9fb19000 CR3: 00724000 CR4: 000006f0
>>>>>>  [<f887f8d1>] scsi_device_unbusy+0xf/0x69 [scsi_mod]
>>>>>>  [<f887b356>] scsi_finish_command+0x10/0x77 [scsi_mod]
>>>>>>  [<c04d8a34>] blk_done_softirq+0x4d/0x58
>>>>>>  [<c042ab5a>] __do_softirq+0x5a/0xbb
>>>>>>  [<c0407451>] do_softirq+0x52/0x9d
>>>>>>  [<c04073f6>] do_IRQ+0xa5/0xae
>>>>>>  [<c040592e>] common_interrupt+0x1a/0x20
>>>>>>  [<c0403ccf>] mwait_idle+0x25/0x38
>>>>>>  [<c0403c90>] cpu_idle+0x9f/0xb9
>>>>>>  [<c06eb9ee>] start_kernel+0x379/0x380
>>>>>>  =======================
>>>>>> BUG: soft lockup - CPU#1 stuck for 10s! [drbd0_receiver:4880]
>>>>>>
>>>>>> Pid: 4880, comm:       drbd0_receiver
>>>>>> EIP: 0060:[<c0608e1b>] CPU: 1
>>>>>> EIP is at _spin_lock_irqsave+0x13/0x27
>>>>>>  EFLAGS: 00000286    Tainted: GF      (2.6.18-92.1.22.el5PAE #1)
>>>>>> EAX: f79c4028 EBX: ca5ff6c0 ECX: f7c4ac00 EDX: 00000202
>>>>>> ESI: f79c4000 EDI: f7c4ac94 EBP: 00000001 DS: 007b ES: 007b
>>>>>> CR0: 8005003b CR2: 7ff72000 CR3: 00724000 CR4: 000006f0
>>>>>>  [<f887edef>] scsi_run_queue+0xcd/0x189 [scsi_mod]
>>>>>>  [<f887f46e>] scsi_next_command+0x25/0x2f [scsi_mod]
>>>>>>  [<f887f583>] scsi_end_request+0x9f/0xa9 [scsi_mod]
>>>>>>  [<f887f6cd>] scsi_io_completion+0x140/0x2ea [scsi_mod]
>>>>>>  [<f885a3d2>] sd_rw_intr+0x1f1/0x21b [sd_mod]
>>>>>>  [<f887b3b9>] scsi_finish_command+0x73/0x77 [scsi_mod]
>>>>>>  [<c04d8a34>] blk_done_softirq+0x4d/0x58
>>>>>>  [<c042ab5a>] __do_softirq+0x5a/0xbb
>>>>>>  [<c0407451>] do_softirq+0x52/0x9d
>>>>>>  [<c042a961>] local_bh_enable+0x74/0x7f
>>>>>>  [<c05d1407>] tcp_prequeue_process+0x5a/0x66
>>>>>>  [<c05d3692>] tcp_recvmsg+0x416/0x9f7
>>>>>>  [<c05a725e>] sock_common_recvmsg+0x2f/0x45
>>>>>>  [<c05a5017>] sock_recvmsg+0xe5/0x100
>>>>>>  [<c0436347>] autoremove_wake_function+0x0/0x2d
>>>>>>  [<c05a6c97>] kernel_sendmsg+0x27/0x35
>>>>>>  [<f8d87718>] drbd_send+0x77/0x13f [drbd]
>>>>>>  [<f8d78485>] drbd_recv+0x57/0xd7 [drbd]
>>>>>>  [<f8d78485>] drbd_recv+0x57/0xd7 [drbd]
>>>>>>  [<f8d78694>] drbd_recv_header+0x10/0x94 [drbd]
>>>>>>  [<f8d78c4b>] drbdd+0x18/0x12b [drbd]
>>>>>>  [<f8d7b586>] drbdd_init+0xa0/0x173 [drbd]
>>>>>>  [<f8d89e47>] drbd_thread_setup+0xbb/0x150 [drbd]
>>>>>>  [<f8d89d8c>] drbd_thread_setup+0x0/0x150 [drbd]
>>>>>>  [<c0405c3b>] kernel_thread_helper+0x7/0x10
>>>>>>     
>>>>>   
>>>> _______________________________________________
>>>> drbd-user mailing list
>>>> drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
>>>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> drbd-user mailing list
>>> drbd-user at lists.linbit.com
>>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>>   
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090202/041a92e8/attachment.htm>


More information about the drbd-user mailing list