Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Jul 22, 2008 at 6:01 PM, Lars Ellenberg <lars.ellenberg at linbit.com> wrote: > On Tue, Jul 22, 2008 at 05:03:23PM +0800, Patrick Coleman wrote: >> Hi, >> >> I've googled around for a while, and I can't find anything definitive >> for or against - what is the maximum volume size supported by DRBD? >> >> I'm running an 11TB DRBD 8.2.6 volume between two nodes, connected by >> 10GE. I've hit some odd issues (OOPSes, continually resyncing data) >> and I'd like to eliminate the volume size as a cause of the issue. > > > DRBD 8.0.x, 8.2.6: > 32bit kernel: > 4 TB hard limit per device > you can have several of them, but you probably run into some > other limit pretty fast. > 64bit kernel: > 4 TB "supported". > (unsupported theoretically) 16 TB hard limit per device, > you can have several of them, but you probably run into some > other limit pretty fast. <snip> > Did that help? mm, thanks for making that clear. The boxes are both Dual-Quad-Core Xeons with 8GB of RAM, running a Debian 2.6.22-amd64 kernel, so memory shouldn't be a problem. I'll describe my current problems in more detail, and perhaps you'll be able to tell me whether it seems to be related at all to the size of the device (though it does sound likely, given you've had reports of instability). Firstly, DRBD seems to think it's permanently out of sync. I installed 8.0.12 (Debian testing) and ran the initial sync, and everything went fine. Then I rebooted the secondary. After each reboot, it says about 3.9TB is out of sync and resyncs it. During the resync, the oos field in /proc/drbd drops to zero. This completes, but then if I check /proc/drbd the oos field is static at about 3.9TB, though the states are UpToDate/UpToDate. Connecting and reconnecting makes it resync, but has the same effect as for a reboot. Invalidating and resyncing the secondary had no effect. I then upgraded to 8.2.26, compiled from the Debian source package. This all worked ok, and a resync happened as expected. I tried blowing away the secondary and rebuilding it from scratch. This seemed to work ok, and started doing the initial sync, but crashed the secondary towards the end. After rebooting, it went back to its resyncing 3.9TB thing. I didn't trust the data on the secondary at this point, so I tried the new verify feature from the primary. This went through to the end and found the 3.9TB OOS but crashed the primary after it had just finished, looking at the logs on the secondary. The primary then decided its own 3.9TB was out of sync, and resynced from the secondary. It's currently doing the same thing it was doing before, with the large oos value in /proc/drbd: version: 8.2.6 (api:88/proto:86-88) GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by phil at fat-tyre, 2008-05-30 12:59:17 0: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r--- ns:0 nr:1000728252 dw:2122151308 dr:2008 al:1 bm:259638 lo:0 pe:0 ua:0 ap:0 oos:3128329444 I was considering moving to 4x3TB DRBD volumes this weekend, and see if that helps, but from what you say it might not make much difference. If you think this is worth trying then I'll give it a go anyway. The issue is that I don't know whether the instability is caused by DRBD or something else in the system (they're both mostly identical). By the time I get to the box the terminal has blanked itself, so I can't see the backtrace. There's nothing in the logs. It may be worth connecting up a serial console, but I've only had two crashes in as many months so it's going to be a while before I get anything. One other thing I've noticed is that the machines started crashing when I upgraded - would downgrading help? Any suggestions you have at all would be most welcome. Cheers, Patrick -- http://www.labyrinthdata.net.au - WA Backup, Web and VPS Hosting