Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Nate Seif wrote, On 02/28/2008 11:12 AM: > > > On Wed, 27 Feb 2008, Lars Ellenberg wrote: > >> On Wed, Feb 27, 2008 at 01:07:04PM -0500, Nate Seif wrote: >>> >>> >>> On Wed, 27 Feb 2008, Lars Ellenberg wrote: >>> >>>> On Tue, Feb 26, 2008 at 04:14:36PM -0500, Nate Seif wrote: >>>>> Hello all: >>>>> I intermittently experience the errors below while running DRBD and >>>>> would >>>>> like to correct whatever condition is causing DRBD to randomly lose >>>>> pages. >>>>> My hard disks and partitions are identical and have never given me >>>>> problems previously. I don't see any other disk I/O errors in my >>>>> logs. And >>>>> it appears that occassionally (not always) these errors are >>>>> preceded by a >>>>> resync of the two disks. >>>>> >>>>> Why would DRBD "attempt to access beyond end of device"? >>>>> >>>>> I am running DRBD 8.06 on Gentoo Linux as I could not get my latest >>>>> Gentoo kernel to load the DRBD module where version > 8.06. >>>>> Metadata is >>>>> "internal" and I'm running Protocol C. I'd be happy to post my >>>>> drbd.conf >>>>> page if necessary. >>>>> >>>>> >>>>> Feb 26 08:21:46 <hostname> attempt to access beyond end of device >>>>> Feb 26 08:21:46 <hostname> drbd0: rw=1, want=211992584, >>>>> limit=211986944 >>>>> Feb 26 08:21:46 <hostname> Buffer I/O error on device drbd0, >>>>> logical block >>>>> 26499072 >>>>> Feb 26 08:21:46 <hostname> lost page write due to I/O error on drbd0 >>>>> Feb 26 08:21:46 <hostname> attempt to access beyond end of device >>>>> Feb 26 08:21:46 <hostname> drbd0: rw=1, want=211992592, >>>>> limit=211986944 >>>>> Feb 26 08:21:46 <hostname> Buffer I/O error on device drbd0, >>>>> logical block >>>>> 26499073 >>>>> Feb 26 08:21:46 <hostname> lost page write due to I/O error on drbd0 >>>>> >>>>> >>>>> Any ideas, tips, help, etc. is much appreciated. Thank you - >>>> >>>> let me guess: >>>> you did mkfs /dev/sda1, not mkfs /dev/drbd0? >>>> well, you screwed up. >>> >>> I did NOT mkfs on /dev/hda4. (I have DRBD running on a pair of IDE/PATA >>> disks and no SATA drives in either system.) >>> >>> I partitioned my disks with fdisk. I have identical drives with >>> identically sized partitions. I compiled the DRBD module, started DRBD, >>> mounted /dev/drbd0 (not /dev/hda4), and formatted drbd0 with an ext3 >>> file system on the primary only after I got DRBD up and running months >>> ago. >> >> please do >> >> tune2fs -l /dev/mapper/vg00--bk1-root | >> grep -e ^Block.count: -e ^Block.size: > > I do not have RAID on either system and /dev/mapper does not exist on > either machine. I have a single, identical hard drive in each system > where /dev/hda4 is the partition DRBD uses. Can I change the tune2fs > command you suggested above to get the bytes my ext3 FS thinks it's > occupying? > You should be able to... please run tune2fs -l /dev/hda4 | grep -e ^Block.count: -e ^Block.size: or better tune2fs -l /dev/drbd0 | grep -e ^Block.count: -e ^Block.size: Lars, was there a reason you sent Nate after something other than /dev/drbd0 ??? > >> >> you get two numbers. >> multiply those, you get the size (in bytes) >> your ext3 thinks it is occupying. >> which is the size of the partition you run the mkfs on, at the time of >> the mkfs run, unless you used special options. >> >> now, do >> grep -e hda4 -e drbd0 /proc/partitions > > > # grep -e hda4 -e drbd0 /proc/partitions > 3 4 105996744 hda4 > 147 0 105993472 drbd0 > # > > > >> >> you again get two numbers, this time unit is kilo byte. >> that is the size of the partitions as the kernel sees them now. >> according to the logs above (the limit= is unit sectors), >> drbd0 will be 105993472 kB. >> I dare say hda4 will be somewhat larger, my best guess, given the >> information I have, is that hda4 will be 105996740 kB. >> and that this also matches what the tune2fs reports. > > > I imagine ext3 and the kernel need to have or use the same number for > file system size on /dev/drbd0. If these numbers differ, then I get the > errors I reported, correct? How (or is it possible?) to know whether the > size as ext3 sees it is correct or the kernel size is correct? Could a > corrupted inode be responsible for this problem? How do I avoid this > problem in the future? Can I run e2fsck on /dev/drbd0 to fix such a > problem? > > > ASIDE: I backed up my data from the primary side. Primary and secondary > machines went to "Primary/Unknown" and "Unknown/Secondary" after copying > a little less than 10 GB of data and drbd reports "NetworkFailure." All > NICs involved seem to be working fine - I can ping and copy files > to/from both computers but DRBD is disconnected. > > > Nate > > >> >>> My logs did not start recording these errors until several weeks ago. >> >> which probably only means that the file system slowly filled up, >> and now starts actually _using_ those areas which are no longer there, >> because they are now occupied by the drbd meta data. -- Todd Denniston Crane Division, Naval Surface Warfare Center (NSWC Crane) Harnessing the Power of Technology for the Warfighter