Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I'm really getting desparate on this, as we are currently not in a high availability state with our server, so I thought I'd include some more info. Attached is my drbd.conf. Also, I am running RHEL5 2.6.18-8.1.14.el5 on both systems. Below is a capture from my system messages log from the original failure: Feb 7 05:41:50 arc-swilliamslx kernel: attempt to access beyond end of device Feb 7 05:41:50 arc-swilliamslx kernel: drbd0: rw=0, want=234300760, limit=234300736 Feb 7 05:41:50 arc-swilliamslx kernel: attempt to access beyond end of device Feb 7 05:41:50 arc-swilliamslx kernel: drbd0: rw=0, want=234300800, limit=234300736 Feb 7 05:41:50 arc-swilliamslx kernel: attempt to access beyond end of device Feb 7 05:41:50 arc-swilliamslx kernel: drbd0: rw=0, want=234300864, limit=234300736 Feb 7 05:41:50 arc-swilliamslx kernel: attempt to access beyond end of device Feb 7 05:41:50 arc-swilliamslx kernel: drbd0: rw=0, want=234300928, limit=234300736 Feb 7 05:41:50 arc-swilliamslx kernel: attempt to access beyond end of device Feb 7 05:41:50 arc-swilliamslx kernel: drbd0: rw=0, want=234300992, limit=234300736 Feb 7 05:41:50 arc-swilliamslx kernel: attempt to access beyond end of device Feb 7 05:41:50 arc-swilliamslx kernel: drbd0: rw=0, want=234301016, limit=234300736 Feb 7 05:41:50 arc-swilliamslx kernel: attempt to access beyond end of device Feb 7 05:41:50 arc-swilliamslx kernel: drbd0: rw=0, want=234300744, limit=234300736 Feb 7 05:41:57 arc-swilliamslx kernel: attempt to access beyond end of device Feb 7 05:41:57 arc-swilliamslx kernel: drbd0: rw=0, want=234303728, limit=234300736 Feb 7 05:41:57 arc-swilliamslx kernel: EXT3-fs error (device drbd0): ext3_free_branches: Read failure, inode=14209948, block= 29287965 Feb 7 05:41:57 arc-swilliamslx kernel: Aborting journal on device drbd0. Feb 7 05:41:57 arc-swilliamslx kernel: EXT3-fs error (device drbd0) in ext3_reserve_inode_write: Journal has aborted Feb 7 05:41:57 arc-swilliamslx kernel: EXT3-fs error (device drbd0) in ext3_truncate: Journal has aborted Feb 7 05:41:57 arc-swilliamslx kernel: EXT3-fs error (device drbd0) in ext3_reserve_inode_write: Journal has aborted Feb 7 05:41:57 arc-swilliamslx kernel: EXT3-fs error (device drbd0) in ext3_orphan_del: Journal has aborted Feb 7 05:41:57 arc-swilliamslx kernel: EXT3-fs error (device drbd0) in ext3_reserve_inode_write: Journal has aborted Feb 7 05:41:57 arc-swilliamslx kernel: EXT3-fs error (device drbd0) in ext3_delete_inode: Journal has aborted Feb 7 05:41:57 arc-swilliamslx kernel: __journal_remove_journal_head: freeing b_committed_data Feb 7 05:41:57 arc-swilliamslx kernel: ext3_abort called. Feb 7 05:41:57 arc-swilliamslx kernel: EXT3-fs error (device drbd0): ext3_journal_start_sb: Detected aborted journal Feb 7 05:41:57 arc-swilliamslx kernel: Remounting filesystem read-only (I believe this is where heartbeat stepped in and failed over to the other server, postgresql was down due to the read-only mount) Feb 7 05:42:32 arc-swilliamslx kernel: __journal_remove_journal_head: freeing b_committed_data Feb 7 05:42:32 arc-swilliamslx kernel: drbd0: role( Primary -> Secondary ) Feb 7 05:42:32 arc-swilliamslx kernel: drbd0: Writing meta data super block now. Feb 7 05:42:34 arc-swilliamslx kernel: drbd0: peer( Secondary -> Primary ) Any help will be greatly appreciated, Doug On Thu, 2008-02-07 at 14:04 -0500, Doug Knight wrote: > Hi list, > I had one of my HA systems, running drbd 8.0.1, issue an error on its > drbd0 device (see title). We recently resized the underlying partition > using gparted to include the partition immediately following it > (verified that the new, larger partitions were identical, and ran the > command to fix the meta-data, suggested when drbd was restarted). We > did this on both systems, and everything seemed OK for a few days. > This morning we got the error, heartbeat detected it, and migrated > resources to the other system, no problem. I took drbd down on both > systems, mounted and set primary drbd0 on the system with the issue, > and did an fsck -fvn /dev/drbd0 on it (unmounted). I get the > following: > > The filesystem size (according to the superblock) is 29288495 blocks > The physical size of the device is 29287592 blocks > Either the superblock or the partition table is likely to be corrupt! > > So, I then ran fsck without the -n to correct. Now, drbd seems to be > completely hosed up. If I do a ./drbd start, the system locks up. If I > do the drbdadm adjust pgsql, it locks up the system too. I went as far > as to shutdown drbd, remove the kernel module, delete the sda5 > partition and recreate it, starting over, and it still locks up the > system when I try to bring up drbd. What I'd like to do is fix the > issue on this system, and let it get back in sync with the other > system. So 1) How do I get drbd back and functioning on the system > where the issue occurred?, and 2) Do I need to do anything to the > system that is currently running OK (due to the partition resize, > etc)? > > Thanks, > Doug Knight > WSI Corp > Andover, MA 01945 > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20080208/29f92f97/attachment.htm> -------------- next part -------------- resource pgsql { protocol C; #incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; startup { #wfc-timeout 0; ## Infinite! wfc-timeout 30; ## 30 seconds degr-wfc-timeout 60; ## 2 minutes. } disk { on-io-error detach; } net { # timeout 60; # connect-int 10; # ping-int 10; # max-buffers 2048; # max-epoch-size 2048; } syncer { rate 120M; #group 1; al-extents 257; } on arc-dknightlx { device /dev/drbd0; #disk /dev/sdc5; # pre-SAS drive install in slot SAS2 disk /dev/sdd5; address 10.4.4.4:7788; meta-disk internal; } on arc-swilliamslx.wsicorp.com { device /dev/drbd0; disk /dev/sda5; address 10.4.4.5:7788; meta-disk internal; } }