[DRBD-user] DRBD LVM2 Trouble

Thu Mar 6 20:04:22 CET 2008

Thanks for your response Lars.

With some outside assistance we were able to single out the issue we  
encountered.

The problem stemmed from the "secondary" machine not being properly  
provisioned. Looking at the two partition tables one can clearly see  
that there was a slight size difference between the partitions on the  
primary and the partitions on the secondary. This left the partition  
for /dev/drbd2 on the secondary machine (nfs2 in the config) *smaller*  
than the /dev/drbd2 partition on the primary machine. Upon noticing  
problematic behavior with our application we stopped the sync, but  
shutting down DRBD on the Secondary machine.

What has caused so much heartache moving forward was that for some  
reason DRBD resized the partition for /dev/drbd2:

Mar  5 14:37:19 nfs2 kernel: drbd2: drbd_bm_resize called with  
capacity == 3421310910

This led to EXT3 trying to access the data that no longer existed on  
that device:

Mar  5 14:37:34 nfs2 kernel: attempt to access beyond end of device

If I understand correctly, because the DRBD device beneath LVM was  
resized, LVM freaked out. LVM was attempting to map the VG over the  
DRBD devices, but could not, because it was the underlying devices  
were smaller than expected. Hence this message:

Mar  5 18:56:26 nfs2 kernel: device-mapper: table: device 147:2 too  
small for target

At this point the gentleman called into to assist us with repair, was  
able to dump the MD of the device and resize the "la-size-sect" by hand.

Once the DRBD device was made to match what LVM expected, we were able  
to bring up the VG.

While it is clear the root cause was non identical partitions,  I am  
amazed that DRBD made the decision to resize the partition instead of  
throw an error message and stop the sync process. In our investigation  
we did find code that seemed to designed to prevent this, though we  
are not sure it is in the correct code path:

/ Never shrink a device with usable data.
                 if(drbd_new_dev_size(mdev,mdev->bc) <
                    drbd_get_capacity(mdev->this_bdev) &&
                    mdev->state.disk >= Outdated ) {
                         dec_local(mdev);
                         ERR("The peer's disk size is too small!\n");
                         drbd_force_state(mdev,NS(conn,Disconnecting));
                         mdev->bc->dc.disk_size = my_usize;
                         return FALSE;
                 }
                 dec_local(mdev);

At any rate. We dodged a bullet this time and while we did have quite  
a scare, I still believe DRBD has a place in our infrastructure.

Please any additional comments and or insights are welcome.

Tyler

On Mar 6, 2008, at 1:51 AM, Lars Ellenberg wrote:

> On Wed, Mar 05, 2008 at 05:52:31PM -0800, Tyler Seaton wrote:
>> Hey Guys,
>>
>> I have a pretty bad situation on my hands.
>>
>> We had a node configured running DRBD 8.0.6. The goal was to keep  
>> this
>> running in standalone mode until we provisioned a matching machine.  
>> We
>> purchased the matching machine and finally had it fully configured  
>> today. I
>> kicked off the initial sync, and had hoped that we would have both  
>> machines
>> in sync within a day or two.
>>
>> This was unfortunately not the case. When I kicked off the sync all  
>> seemed
>> well however our application quickly began throwing error's as the  
>> primary
>> node became read only. I quickly shut off drbd on the secondary  
>> node and
>> attempted to return the original configuration to the primary  
>> server. Sadly
>> no amount of back peddling has helped us. We are currently dead in  
>> the
>> water.
>>
>> DRBD was configured on the primary node with LVM. We have/had 3  
>> resources
>> configured the first 2 being 2TB in size and the 3rd being 1.4-5TB  
>> in size.
>> Since stopping the initial sync I have not been able to mount LVM  
>> Volume
>> Group that sits above the three resources. NOTE: the SDB devices on  
>> nfs2
>> are numbered differently.
>>
>> /var/log/messages was giving the following messages:
>>
>> Mar  5 14:38:35 nfs2 kernel: drbd2: rw=0, want=3434534208,  
>> limit=3421310910
>> Mar  5 14:38:35 nfs2 kernel: attempt to access beyond end of device
>> Mar  5 14:38:35 nfs2 kernel: drbd2: rw=0, want=3434534216,  
>> limit=3421310910
>> Mar  5 14:38:35 nfs2 kernel: attempt to access beyond end of device
>> Mar  5 14:38:35 nfs2 kernel: drbd2: rw=0, want=3434534224,  
>> limit=3421310910
>> Mar  5 14:38:35 nfs2 kernel: attempt to access beyond end of device
>
> please provide FROM BOTH NODES output of
> # drbdadm -d attach all
> # sfdisk -d /dev/sdb
> # grep -e drbd -e sdb /proc/partitions
>
> -- 
> : Lars Ellenberg                           http://www.linbit.com :
> : DRBD/HA support and consulting             sales at linbit.com :
> : LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
> : Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
> __
> please use the "List-Reply" function of your email client.
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20080306/af0873f4/attachment.htm>