Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, All -- I'm working on getting DRBD up and running for a large storage array -- around 10TB. I'm having two issues that I suspect are related, and am hoping that someone might be able to help me out with them. Specifically, I'm wondering whether these issues are related; and if so, whether there is a method that can be used to allow the desired behavior. First of all, 10TB requires a fair amount of time to sync. Even if a 10Gbps network is used, drbd's limit of 650MB/s means that a full sync would take between 4 and 5 hours. With a 1Gbps network, that time rises to closer to 18 hours since the effective speed is 125MB/s. Because of this, I'm hoping to avoid the initial sync phase. I'm starting with empty disks, and so I don't need the bitmap to be synchronized for data preservation or anything like that. In googling around, I found the following command, which does in fact have the effect of causing both nodes to report that they are UpToDate: drbdadm -- 6::::1 set-gi resource So, the first question is whether this is, in fact, the appropriate command to use if one wants to avoid the initial sync. Is there another method that's preferred? Is it simply not possible to skip the initial sync any longer? What I really want is a way to tell drbd to sync the bitmap without actually syncing data, since there isn't data that I care about. The second issue that I'm having is that having established both nodes as UpToDate using the command above, and having swapped the Primary role back and forth between the hosts successfully, if one node fails (or is rebooted), it requires a full sync after coming back online. This happens even if the other node was primary at the time that the local machine went down, and if no changes have been made to the local node. It appears that there is something going on with the size of the bitmap changing on the rebooted host, based on the logs, though that doesn't really make all that much sense to me: Oct 21 17:16:11 scurry4 kernel: drbd0: No usable activity log found. Oct 21 17:16:11 scurry4 kernel: drbd0: max_segment_size ( = BIO size ) = 32768 Oct 21 17:16:11 scurry4 kernel: drbd0: drbd_bm_resize called with capacity == 21462221048 Oct 21 17:16:11 scurry4 kernel: drbd0: resync bitmap: bits=2682777631 words=41918401 Oct 21 17:16:11 scurry4 kernel: drbd0: size = 10 TB (10731110524 KB) Oct 21 17:16:11 scurry4 kernel: drbd0: Writing the whole bitmap, size changed Oct 21 17:16:11 scurry4 kernel: drbd0: writing of bitmap took 468 jiffies Oct 21 17:16:11 scurry4 kernel: drbd0: 10 TB (2682777631 bits) marked out-of-sync by on disk bit-map. Oct 21 17:16:12 scurry4 kernel: drbd0: reading of bitmap took 289 jiffies Oct 21 17:16:12 scurry4 kernel: drbd0: recounting of set bits took additional 271 jiffies Oct 21 17:16:12 scurry4 kernel: drbd0: 10 TB (2682777631 bits) marked out-of-sync by on disk bit-map. Oct 21 17:16:12 scurry4 kernel: drbd0: disk( Attaching -> Inconsistent ) Oct 21 17:16:12 scurry4 kernel: drbd0: Writing meta data super block now. The remote host, which remained up, shows this in its logs: Oct 21 17:16:48 scurry24 kernel: drbd0: Becoming sync source due to disk states. Oct 21 17:16:48 scurry24 kernel: drbd0: Writing the whole bitmap, full sync required after drbd_sync_handshake. Oct 21 17:16:48 scurry24 kernel: drbd0: Writing meta data super block now. I'm wondering whether its possible that this behavior is related to the skipped sync documented above, or if it may be related in some way to the size of the device being synced. Has anyone seen this before, or can anyone shed some light on that? Basic info: OS is CentOS x86_64. Kernel version is 2.6.18-128.1.10.el5. DRBD is version 8.2.6-2. Thanks for any help, Ian -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20091022/34d662b6/attachment.htm>