Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
So we had another kernel oops, although this one I unfortunately was not able to diagnose; it never made it into the logfile. I did see two kinds of log messages. About 1000 of these: Feb 10 05:56:55 rin kernel: is_tree_node: node level 4562 does not match to the expected one 1 Feb 10 05:56:55 rin kernel: drbd(43,1):vs-5150: search_by_key: invalid format found in block 28538659. Fsck? Feb 10 05:56:55 rin kernel: drbd(43,1):vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [10763889 10764043 0x0 SD] Feb 10 05:56:55 rin kernel: is_tree_node: node level 4562 does not match to the expected one 1 Feb 10 05:56:55 rin kernel: drbd(43,1):vs-5150: search_by_key: invalid format found in block 28538659. Fsck? Feb 10 05:56:55 rin kernel: drbd(43,1):vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [10763889 10764044 0x0 SD] Feb 10 05:56:55 rin kernel: is_tree_node: node level 4562 does not match to the expected one 1 Feb 10 05:56:55 rin kernel: drbd(43,1):vs-5150: search_by_key: invalid format found in block 28538659. Fsck? Feb 10 05:56:55 rin kernel: drbd(43,1):vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [10763889 10764045 0x0 SD] And another 100 or so of these: Feb 10 05:42:57 rin kernel: drbd(43,1):vs-13075: reiserfs_read_inode2: dead inode read from disk [2050566 2050611 0x0 SD ]. This is likely to be race with knfsd. Ignore Feb 10 05:42:57 rin kernel: drbd(43,1):vs-13075: reiserfs_read_inode2: dead inode read from disk [2050566 2050612 0x0 SD ]. This is likely to be race with knfsd. Ignore Feb 10 05:42:57 rin kernel: drbd(43,1):vs-13075: reiserfs_read_inode2: dead inode read from disk [2050566 2050613 0x0 SD ]. This is likely to be race with knfsd. Ignore Of the first kind, the 'invalid format' messages, we saw about 20 of them last night about 7pm, then nothing until 5:40 this morning, when 1000+ of them came in. In the MIDDLE of this huge spat of messages you see the second kind, the nfs race condition messages. All the batch starting at 5:40 or so came in quick succession; basically a constant spew until 5:56 when everthing came to a screeching halt. I know it appears the most likely culprit is an underlying disk problem, but this is unlikely -- The underlying block devices are hardware RAID volumes, not just disks, and nothing on the RAID side reports any problem. Also, there are no log messages at all indicating disk problems. Only drbd complained, and drbd is what panicked the system with a null pointer deref. Any ideas? Brian