[DRBD-user] [Fwd: Re: [Linux-HA] heartbeat 2.0.8: lockups] kernel
oops
Gerry Reno
greno at verizon.net
Wed Feb 21 16:40:44 CET 2007
Forwarding this from linux-ha list:
-------- Original Message --------
Subject: Re: [Linux-HA] heartbeat 2.0.8: lockups
Date: Mon, 19 Feb 2007 09:57:16 -0500
From: Gerry Reno <greno at verizon.net>
Reply-To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
References:
<12392854.6367231171759462886.JavaMail.root at vms074.mailsrvcs.net>
<26ef5e70702190352p4d6d24cajb31b28edbe0d1885 at mail.gmail.com>
<45D9B8CD.70907 at verizon.net>
Gerry Reno wrote:
> Andrew Beekhof wrote:
>> so what are we looking at here? what time did the lockup occur?
>>
>> On 2/18/07, greno at verizon.net <greno at verizon.net> wrote:
>>> I've been running heartbeat on my two nodes for almost two weeks and
>>> everything is functioning as it is supposed to with the exception
>>> that I am getting frequent lockups on the primary server. It
>>> doesn't matter which server that I make the primary it will
>>> eventually be locked up. The lockups are very hard. There is no
>>> response of any kind out of the locked up machine. Sometimes the
>>> drive light will be on and sometimes not. The lockups are occurring
>>> at times of disk access such as during backups or right after I ftp
>>> a file or tar file over to another machine from the drbd array.
>>> There is very little in the logs. It just shows a big gap and then
>>> a syslog restart for when I cold booted the server to bring it back
>>> up. I'm going to attach dmesg output and /var/log/messages output
>>> for both servers. What should I do to track down the source of this
>>> problem?
>>>
>>> heartbeat-2.0.8-1.fc6
>>> drbd-0.7.23-15.fc6.at
>>>
>>> Other info:
>>> drbd is running over logical volume which is over a RAID-1 md array
>>> on each server.
>>>
>>> Both servers were rock stable prior to installing HA.
>>>
>>>
> Andrew,
> The lockup occurred about 18:07. If you search for 'restart' in the
> log you should find it. It was about a 7 minute gap. This one
> occurred when I ftp'd a file.
>
> Gerry
Andrew,
I just went and checked on the primary and this was showing in a
terminal window:
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: Oops: 0000 [#1]
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: SMP
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: CPU: 0
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: EIP: 0061:[<c042e17d>] Tainted: GF VLI
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: EFLAGS: 00010202 (2.6.19-1.2895.fc6xen #1)
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: EIP is at put_pid+0x6/0x20
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: eax: 000000ab ebx: 00000008 ecx: c1b84140
edx: 000000ab
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: esi: c1b840c0 edi: c1b840c0 ebp: e3957be0
esp: e9ea0f0c
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: ds: 007b es: 007b ss: 0069
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: Process IPaddr (pid: 29397, ti=e9ea0000
task=e70464d0 task.ti=e9ea0000)
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: Stack: c046c707 00000000 00000000 d0e76988 ed7e8520
e3957be0 d76b4b40 00000000
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: e1f53d20 c046a03a c0467578 d76b4b40 000001ff
00000004 c041fdbd 00000000
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: 00000000 e7046978 d76b4b40 e70464d0 00000001
c0420fa0 00000000 c04442e3
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: Call Trace:
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: [<c046c707>] __fput+0x12f/0x190
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: [<c046a03a>] filp_close+0x52/0x59
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: [<c041fdbd>] put_files_struct+0x65/0xa7
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: [<c0420fa0>] do_exit+0x246/0x787
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: [<c042156e>] sys_exit_group+0x0/0xd
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ...
grp-01-30-02 kernel: [<c0404efb>] syscall_call+0x7/0xb
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:31 2007 ...
grp-01-30-02 kernel: [<00c5d402>] 0xc5d402
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:31 2007 ...
grp-01-30-02 kernel: =======================
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:31 2007 ...
grp-01-30-02 kernel: Code: 00 77 09 8d 1c 06 85 db 7e 15 eb a8 83 c3 08
81 c6 00 80 00 00 31 c9 81 fb 74 53 68 c0 72 cc 89 f8 5b 5e 5f c3 85 c0
89 c2 74 19 <8b> 00 48 74 0a 90 ff 0a 0f 94 c0 84 c0 74 0a a1 28 2d 83
c0 e9
Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:31 2007 ...
grp-01-30-02 kernel: EIP: [<c042e17d>] put_pid+0x6/0x20 SS:ESP 0069:e9ea0f0c
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
More information about the drbd-user
mailing list