[DRBD-user] DRBD attempts to access beyond end of device

Thu Feb 28 18:29:05 CET 2008

Nate Seif wrote, On 02/28/2008 11:12 AM:
> 
> 
> On Wed, 27 Feb 2008, Lars Ellenberg wrote:
> 
>> On Wed, Feb 27, 2008 at 01:07:04PM -0500, Nate Seif wrote:
>>>
>>>
>>> On Wed, 27 Feb 2008, Lars Ellenberg wrote:
>>>
>>>> On Tue, Feb 26, 2008 at 04:14:36PM -0500, Nate Seif wrote:
>>>>> Hello all:
>>>>> I intermittently experience the errors below while running DRBD and 
>>>>> would
>>>>> like to correct whatever condition is causing DRBD to randomly lose 
>>>>> pages.
>>>>> My hard disks and partitions are identical and have never given me
>>>>> problems previously. I don't see any other disk I/O errors in my 
>>>>> logs. And
>>>>> it appears that occassionally (not always) these errors are 
>>>>> preceded by a
>>>>> resync of the two disks.
>>>>>
>>>>> Why would DRBD "attempt to access beyond end of device"?
>>>>>
>>>>> I am running DRBD 8.06 on Gentoo Linux as I could not get my latest
>>>>> Gentoo kernel to load the DRBD module where version > 8.06. 
>>>>> Metadata is
>>>>> "internal" and I'm running Protocol C. I'd be happy to post my 
>>>>> drbd.conf
>>>>> page if necessary.
>>>>>
>>>>>
>>>>> Feb 26 08:21:46 <hostname> attempt to access beyond end of device
>>>>> Feb 26 08:21:46 <hostname> drbd0: rw=1, want=211992584, 
>>>>> limit=211986944
>>>>> Feb 26 08:21:46 <hostname> Buffer I/O error on device drbd0, 
>>>>> logical block
>>>>> 26499072
>>>>> Feb 26 08:21:46 <hostname> lost page write due to I/O error on drbd0
>>>>> Feb 26 08:21:46 <hostname> attempt to access beyond end of device
>>>>> Feb 26 08:21:46 <hostname> drbd0: rw=1, want=211992592, 
>>>>> limit=211986944
>>>>> Feb 26 08:21:46 <hostname> Buffer I/O error on device drbd0, 
>>>>> logical block
>>>>> 26499073
>>>>> Feb 26 08:21:46 <hostname> lost page write due to I/O error on drbd0
>>>>>
>>>>>
>>>>> Any ideas, tips, help, etc. is much appreciated. Thank you -
>>>>
>>>> let me guess:
>>>> you did mkfs /dev/sda1, not mkfs /dev/drbd0?
>>>> well, you screwed up.
>>>
>>> I did NOT mkfs on /dev/hda4. (I have DRBD running on a pair of IDE/PATA
>>> disks and no SATA drives in either system.)
>>>
>>> I partitioned my disks with fdisk. I have identical drives with
>>> identically sized partitions. I compiled the DRBD module, started DRBD,
>>> mounted /dev/drbd0 (not /dev/hda4), and formatted drbd0 with an ext3
>>> file system on the primary only after I got DRBD up and running months
>>> ago.
>>
>> please do
>>
>>      tune2fs -l /dev/mapper/vg00--bk1-root |
>>     grep -e ^Block.count: -e ^Block.size:
> 
> I do not have RAID on either system and /dev/mapper does not exist on 
> either machine. I have a single, identical hard drive in each system 
> where /dev/hda4 is the partition DRBD uses. Can I change the tune2fs 
> command you suggested above to get the bytes my ext3 FS thinks it's 
> occupying?
> 

You should be able to...

please run
tune2fs -l /dev/hda4 |
      grep -e ^Block.count: -e ^Block.size:

or better
tune2fs -l /dev/drbd0 |
      grep -e ^Block.count: -e ^Block.size:

Lars, was there a reason you sent Nate after something other than /dev/drbd0 ???

> 
>>
>> you get two numbers.
>> multiply those, you get the size (in bytes)
>> your ext3 thinks it is occupying.
>> which is the size of the partition you run the mkfs on, at the time of
>> the mkfs run, unless you used special options.
>>
>> now, do
>>     grep -e hda4 -e drbd0 /proc/partitions
> 
> 
> # grep -e hda4 -e drbd0 /proc/partitions
>    3     4  105996744 hda4
>  147     0  105993472 drbd0
> #
> 
> 
> 
>>
>> you again get two numbers, this time unit is kilo byte.
>> that is the size of the partitions as the kernel sees them now.
>> according to the logs above (the limit= is unit sectors),
>> drbd0 will be 105993472 kB.
>> I dare say hda4 will be somewhat larger, my best guess, given the
>> information I have, is that hda4 will be 105996740 kB.
>> and that this also matches what the tune2fs reports.
> 
> 
> I imagine ext3 and the kernel need to have or use the same number for 
> file system size on /dev/drbd0. If these numbers differ, then I get the 
> errors I reported, correct? How (or is it possible?) to know whether the 
> size as ext3 sees it is correct or the kernel size is correct? Could a 
> corrupted inode be responsible for this problem? How do I avoid this 
> problem in the future? Can I run e2fsck on /dev/drbd0 to fix such a 
> problem?
> 
> 
> ASIDE: I backed up my data from the primary side. Primary and secondary 
> machines went to "Primary/Unknown" and "Unknown/Secondary" after copying 
> a little less than 10 GB of data and drbd reports "NetworkFailure." All 
> NICs involved seem to be working fine - I can ping and copy files 
> to/from both computers but DRBD is disconnected.
> 
> 
> Nate
> 
> 
>>
>>> My logs did not start recording these errors until several weeks ago.
>>
>> which probably only means that the file system slowly filled up,
>> and now starts actually _using_ those areas which are no longer there,
>> because they are now occupied by the drbd meta data.

-- 
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter