Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Thanks for your response Lars. With some outside assistance we were able to single out the issue we encountered. The problem stemmed from the "secondary" machine not being properly provisioned. Looking at the two partition tables one can clearly see that there was a slight size difference between the partitions on the primary and the partitions on the secondary. This left the partition for /dev/drbd2 on the secondary machine (nfs2 in the config) *smaller* than the /dev/drbd2 partition on the primary machine. Upon noticing problematic behavior with our application we stopped the sync, but shutting down DRBD on the Secondary machine. What has caused so much heartache moving forward was that for some reason DRBD resized the partition for /dev/drbd2: Mar 5 14:37:19 nfs2 kernel: drbd2: drbd_bm_resize called with capacity == 3421310910 This led to EXT3 trying to access the data that no longer existed on that device: Mar 5 14:37:34 nfs2 kernel: attempt to access beyond end of device If I understand correctly, because the DRBD device beneath LVM was resized, LVM freaked out. LVM was attempting to map the VG over the DRBD devices, but could not, because it was the underlying devices were smaller than expected. Hence this message: Mar 5 18:56:26 nfs2 kernel: device-mapper: table: device 147:2 too small for target At this point the gentleman called into to assist us with repair, was able to dump the MD of the device and resize the "la-size-sect" by hand. Once the DRBD device was made to match what LVM expected, we were able to bring up the VG. While it is clear the root cause was non identical partitions, I am amazed that DRBD made the decision to resize the partition instead of throw an error message and stop the sync process. In our investigation we did find code that seemed to designed to prevent this, though we are not sure it is in the correct code path: / Never shrink a device with usable data. if(drbd_new_dev_size(mdev,mdev->bc) < drbd_get_capacity(mdev->this_bdev) && mdev->state.disk >= Outdated ) { dec_local(mdev); ERR("The peer's disk size is too small!\n"); drbd_force_state(mdev,NS(conn,Disconnecting)); mdev->bc->dc.disk_size = my_usize; return FALSE; } dec_local(mdev); At any rate. We dodged a bullet this time and while we did have quite a scare, I still believe DRBD has a place in our infrastructure. Please any additional comments and or insights are welcome. Tyler On Mar 6, 2008, at 1:51 AM, Lars Ellenberg wrote: > On Wed, Mar 05, 2008 at 05:52:31PM -0800, Tyler Seaton wrote: >> Hey Guys, >> >> I have a pretty bad situation on my hands. >> >> We had a node configured running DRBD 8.0.6. The goal was to keep >> this >> running in standalone mode until we provisioned a matching machine. >> We >> purchased the matching machine and finally had it fully configured >> today. I >> kicked off the initial sync, and had hoped that we would have both >> machines >> in sync within a day or two. >> >> This was unfortunately not the case. When I kicked off the sync all >> seemed >> well however our application quickly began throwing error's as the >> primary >> node became read only. I quickly shut off drbd on the secondary >> node and >> attempted to return the original configuration to the primary >> server. Sadly >> no amount of back peddling has helped us. We are currently dead in >> the >> water. >> >> DRBD was configured on the primary node with LVM. We have/had 3 >> resources >> configured the first 2 being 2TB in size and the 3rd being 1.4-5TB >> in size. >> Since stopping the initial sync I have not been able to mount LVM >> Volume >> Group that sits above the three resources. NOTE: the SDB devices on >> nfs2 >> are numbered differently. >> >> /var/log/messages was giving the following messages: >> >> Mar 5 14:38:35 nfs2 kernel: drbd2: rw=0, want=3434534208, >> limit=3421310910 >> Mar 5 14:38:35 nfs2 kernel: attempt to access beyond end of device >> Mar 5 14:38:35 nfs2 kernel: drbd2: rw=0, want=3434534216, >> limit=3421310910 >> Mar 5 14:38:35 nfs2 kernel: attempt to access beyond end of device >> Mar 5 14:38:35 nfs2 kernel: drbd2: rw=0, want=3434534224, >> limit=3421310910 >> Mar 5 14:38:35 nfs2 kernel: attempt to access beyond end of device > > please provide FROM BOTH NODES output of > # drbdadm -d attach all > # sfdisk -d /dev/sdb > # grep -e drbd -e sdb /proc/partitions > > -- > : Lars Ellenberg http://www.linbit.com : > : DRBD/HA support and consulting sales at linbit.com : > : LINBIT Information Technologies GmbH Tel +43-1-8178292-0 : > : Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 : > __ > please use the "List-Reply" function of your email client. > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20080306/af0873f4/attachment.htm>