Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Gerry Reno wrote: > > Forwarding this from linux-ha list: > > -------- Original Message -------- > Subject: Re: [Linux-HA] heartbeat 2.0.8: lockups > Date: Mon, 19 Feb 2007 09:57:16 -0500 > From: Gerry Reno <greno at verizon.net> > Reply-To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org> > To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org> > References: > <12392854.6367231171759462886.JavaMail.root at vms074.mailsrvcs.net> > <26ef5e70702190352p4d6d24cajb31b28edbe0d1885 at mail.gmail.com> > <45D9B8CD.70907 at verizon.net> > > > > Gerry Reno wrote: >> Andrew Beekhof wrote: >>> so what are we looking at here? what time did the lockup occur? >>> >>> On 2/18/07, greno at verizon.net <greno at verizon.net> wrote: >>>> I've been running heartbeat on my two nodes for almost two weeks >>>> and everything is functioning as it is supposed to with the >>>> exception that I am getting frequent lockups on the primary >>>> server. It doesn't matter which server that I make the primary it >>>> will eventually be locked up. The lockups are very hard. There is >>>> no response of any kind out of the locked up machine. Sometimes >>>> the drive light will be on and sometimes not. The lockups are >>>> occurring at times of disk access such as during backups or right >>>> after I ftp a file or tar file over to another machine from the >>>> drbd array. There is very little in the logs. It just shows a big >>>> gap and then a syslog restart for when I cold booted the server to >>>> bring it back up. I'm going to attach dmesg output and >>>> /var/log/messages output for both servers. What should I do to >>>> track down the source of this problem? >>>> >>>> heartbeat-2.0.8-1.fc6 >>>> drbd-0.7.23-15.fc6.at >>>> >>>> Other info: >>>> drbd is running over logical volume which is over a RAID-1 md array >>>> on each server. >>>> >>>> Both servers were rock stable prior to installing HA. >>>> >>>> >> Andrew, >> The lockup occurred about 18:07. If you search for 'restart' in the >> log you should find it. It was about a 7 minute gap. This one >> occurred when I ftp'd a file. >> >> Gerry > Andrew, > I just went and checked on the primary and this was showing in a > terminal window: > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: Oops: 0000 [#1] > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: SMP > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: CPU: 0 > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: EIP: 0061:[<c042e17d>] Tainted: GF VLI > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: EFLAGS: 00010202 (2.6.19-1.2895.fc6xen #1) > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: EIP is at put_pid+0x6/0x20 > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: eax: 000000ab ebx: 00000008 ecx: c1b84140 > edx: 000000ab > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: esi: c1b840c0 edi: c1b840c0 ebp: e3957be0 > esp: e9ea0f0c > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: ds: 007b es: 007b ss: 0069 > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: Process IPaddr (pid: 29397, ti=e9ea0000 > task=e70464d0 task.ti=e9ea0000) > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: Stack: c046c707 00000000 00000000 d0e76988 > ed7e8520 e3957be0 d76b4b40 00000000 > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: e1f53d20 c046a03a c0467578 d76b4b40 > 000001ff 00000004 c041fdbd 00000000 > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: 00000000 e7046978 d76b4b40 e70464d0 > 00000001 c0420fa0 00000000 c04442e3 > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: Call Trace: > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: [<c046c707>] __fput+0x12f/0x190 > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: [<c046a03a>] filp_close+0x52/0x59 > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: [<c041fdbd>] put_files_struct+0x65/0xa7 > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: [<c0420fa0>] do_exit+0x246/0x787 > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: [<c042156e>] sys_exit_group+0x0/0xd > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:30 2007 ... > grp-01-30-02 kernel: [<c0404efb>] syscall_call+0x7/0xb > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:31 2007 ... > grp-01-30-02 kernel: [<00c5d402>] 0xc5d402 > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:31 2007 ... > grp-01-30-02 kernel: ======================= > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:31 2007 ... > grp-01-30-02 kernel: Code: 00 77 09 8d 1c 06 85 db 7e 15 eb a8 83 c3 > 08 81 c6 00 80 00 00 31 c9 81 fb 74 53 68 c0 72 cc 89 f8 5b 5e 5f c3 > 85 c0 89 c2 74 19 <8b> 00 48 74 0a 90 ff 0a 0f 94 c0 84 c0 74 0a a1 28 > 2d 83 c0 e9 > > Message from syslogd at grp-01-30-02 at Sun Feb 18 23:09:31 2007 ... > grp-01-30-02 kernel: EIP: [<c042e17d>] put_pid+0x6/0x20 SS:ESP > 0069:e9ea0f0c > > > _______________________________________________ > Linux-HA mailing list > Linux-HA at lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > Got another oops today on drbd device: Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:34 2007 ... grp-01-30-02 kernel: Oops: 0000 [#2] Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:34 2007 ... grp-01-30-02 kernel: SMP Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:34 2007 ... grp-01-30-02 kernel: CPU: 0 Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:34 2007 ... grp-01-30-02 kernel: EIP: 0061:[<000000c0>] Tainted: GF VLI Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:34 2007 ... grp-01-30-02 kernel: EFLAGS: 00210206 (2.6.19-1.2895.fc6xen #1) Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:34 2007 ... grp-01-30-02 kernel: EIP is at 0xc0 Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:34 2007 ... grp-01-30-02 kernel: eax: e3957820 ebx: e3957820 ecx: 000000c0 edx: e87ae220 Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:34 2007 ... grp-01-30-02 kernel: esi: e87ae220 edi: e87ae2a0 ebp: e3957820 esp: deb1cf90 Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:34 2007 ... grp-01-30-02 kernel: ds: 007b es: 007b ss: 0069 Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:34 2007 ... grp-01-30-02 kernel: Process mysqld (pid: 7042, ti=deb1c000 task=d40a8cb0 task.ti=deb1c000) Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:34 2007 ... grp-01-30-02 kernel: Stack: c046a01b deb1cfbc e87ae220 00000052 e87ae2a0 c046afd3 00000052 086cdb78 Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:34 2007 ... grp-01-30-02 kernel: 00000052 deb1c000 c0404efb 00000052 00000000 00000001 086cdb78 00000052 Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:35 2007 ... grp-01-30-02 kernel: 00db00d8 ffffffda 0000007b 0000007b 00000006 00338402 00000073 00200293 Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:35 2007 ... grp-01-30-02 kernel: Call Trace: Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:35 2007 ... grp-01-30-02 kernel: Inexact backtrace: Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:36 2007 ... grp-01-30-02 kernel: [<c046a01b>] filp_close+0x33/0x59 Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:36 2007 ... grp-01-30-02 kernel: [<c046afd3>] sys_close+0x73/0xaa Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:36 2007 ... grp-01-30-02 kernel: [<c0404efb>] syscall_call+0x7/0xb Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:36 2007 ... grp-01-30-02 kernel: ======================= Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:36 2007 ... grp-01-30-02 kernel: Code: Bad EIP value. Message from syslogd at grp-01-30-02 at Wed Feb 21 04:34:36 2007 ... grp-01-30-02 kernel: EIP: [<000000c0>] 0xc0 SS:ESP 0069:deb1cf90