Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I'm really getting desparate on this, as we are currently not in a high
availability state with our server, so I thought I'd include some more
info. Attached is my drbd.conf. Also, I am running RHEL5
2.6.18-8.1.14.el5 on both systems. Below is a capture from my system
messages log from the original failure:
Feb 7 05:41:50 arc-swilliamslx kernel: attempt to access beyond end of
device
Feb 7 05:41:50 arc-swilliamslx kernel: drbd0: rw=0, want=234300760,
limit=234300736
Feb 7 05:41:50 arc-swilliamslx kernel: attempt to access beyond end of
device
Feb 7 05:41:50 arc-swilliamslx kernel: drbd0: rw=0, want=234300800,
limit=234300736
Feb 7 05:41:50 arc-swilliamslx kernel: attempt to access beyond end of
device
Feb 7 05:41:50 arc-swilliamslx kernel: drbd0: rw=0, want=234300864,
limit=234300736
Feb 7 05:41:50 arc-swilliamslx kernel: attempt to access beyond end of
device
Feb 7 05:41:50 arc-swilliamslx kernel: drbd0: rw=0, want=234300928,
limit=234300736
Feb 7 05:41:50 arc-swilliamslx kernel: attempt to access beyond end of
device
Feb 7 05:41:50 arc-swilliamslx kernel: drbd0: rw=0, want=234300992,
limit=234300736
Feb 7 05:41:50 arc-swilliamslx kernel: attempt to access beyond end of
device
Feb 7 05:41:50 arc-swilliamslx kernel: drbd0: rw=0, want=234301016,
limit=234300736
Feb 7 05:41:50 arc-swilliamslx kernel: attempt to access beyond end of
device
Feb 7 05:41:50 arc-swilliamslx kernel: drbd0: rw=0, want=234300744,
limit=234300736
Feb 7 05:41:57 arc-swilliamslx kernel: attempt to access beyond end of
device
Feb 7 05:41:57 arc-swilliamslx kernel: drbd0: rw=0, want=234303728,
limit=234300736
Feb 7 05:41:57 arc-swilliamslx kernel: EXT3-fs error (device drbd0):
ext3_free_branches: Read failure, inode=14209948, block=
29287965
Feb 7 05:41:57 arc-swilliamslx kernel: Aborting journal on device
drbd0.
Feb 7 05:41:57 arc-swilliamslx kernel: EXT3-fs error (device drbd0) in
ext3_reserve_inode_write: Journal has aborted
Feb 7 05:41:57 arc-swilliamslx kernel: EXT3-fs error (device drbd0) in
ext3_truncate: Journal has aborted
Feb 7 05:41:57 arc-swilliamslx kernel: EXT3-fs error (device drbd0) in
ext3_reserve_inode_write: Journal has aborted
Feb 7 05:41:57 arc-swilliamslx kernel: EXT3-fs error (device drbd0) in
ext3_orphan_del: Journal has aborted
Feb 7 05:41:57 arc-swilliamslx kernel: EXT3-fs error (device drbd0) in
ext3_reserve_inode_write: Journal has aborted
Feb 7 05:41:57 arc-swilliamslx kernel: EXT3-fs error (device drbd0) in
ext3_delete_inode: Journal has aborted
Feb 7 05:41:57 arc-swilliamslx kernel: __journal_remove_journal_head:
freeing b_committed_data
Feb 7 05:41:57 arc-swilliamslx kernel: ext3_abort called.
Feb 7 05:41:57 arc-swilliamslx kernel: EXT3-fs error (device drbd0):
ext3_journal_start_sb: Detected aborted journal
Feb 7 05:41:57 arc-swilliamslx kernel: Remounting filesystem read-only
(I believe this is where heartbeat stepped in and failed over to the
other server, postgresql was down due to the read-only mount)
Feb 7 05:42:32 arc-swilliamslx kernel: __journal_remove_journal_head:
freeing b_committed_data
Feb 7 05:42:32 arc-swilliamslx kernel: drbd0: role( Primary ->
Secondary )
Feb 7 05:42:32 arc-swilliamslx kernel: drbd0: Writing meta data super
block now.
Feb 7 05:42:34 arc-swilliamslx kernel: drbd0: peer( Secondary ->
Primary )
Any help will be greatly appreciated,
Doug
On Thu, 2008-02-07 at 14:04 -0500, Doug Knight wrote:
> Hi list,
> I had one of my HA systems, running drbd 8.0.1, issue an error on its
> drbd0 device (see title). We recently resized the underlying partition
> using gparted to include the partition immediately following it
> (verified that the new, larger partitions were identical, and ran the
> command to fix the meta-data, suggested when drbd was restarted). We
> did this on both systems, and everything seemed OK for a few days.
> This morning we got the error, heartbeat detected it, and migrated
> resources to the other system, no problem. I took drbd down on both
> systems, mounted and set primary drbd0 on the system with the issue,
> and did an fsck -fvn /dev/drbd0 on it (unmounted). I get the
> following:
>
> The filesystem size (according to the superblock) is 29288495 blocks
> The physical size of the device is 29287592 blocks
> Either the superblock or the partition table is likely to be corrupt!
>
> So, I then ran fsck without the -n to correct. Now, drbd seems to be
> completely hosed up. If I do a ./drbd start, the system locks up. If I
> do the drbdadm adjust pgsql, it locks up the system too. I went as far
> as to shutdown drbd, remove the kernel module, delete the sda5
> partition and recreate it, starting over, and it still locks up the
> system when I try to bring up drbd. What I'd like to do is fix the
> issue on this system, and let it get back in sync with the other
> system. So 1) How do I get drbd back and functioning on the system
> where the issue occurred?, and 2) Do I need to do anything to the
> system that is currently running OK (due to the partition resize,
> etc)?
>
> Thanks,
> Doug Knight
> WSI Corp
> Andover, MA 01945
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20080208/29f92f97/attachment.htm>
-------------- next part --------------
resource pgsql {
protocol C;
#incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";
startup {
#wfc-timeout 0; ## Infinite!
wfc-timeout 30; ## 30 seconds
degr-wfc-timeout 60; ## 2 minutes.
}
disk {
on-io-error detach;
}
net {
# timeout 60;
# connect-int 10;
# ping-int 10;
# max-buffers 2048;
# max-epoch-size 2048;
}
syncer {
rate 120M;
#group 1;
al-extents 257;
}
on arc-dknightlx {
device /dev/drbd0;
#disk /dev/sdc5; # pre-SAS drive install in slot SAS2
disk /dev/sdd5;
address 10.4.4.4:7788;
meta-disk internal;
}
on arc-swilliamslx.wsicorp.com {
device /dev/drbd0;
disk /dev/sda5;
address 10.4.4.5:7788;
meta-disk internal;
}
}