Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Friday 26 November 2004 11:22, Per Liden wrote: > On Thu, 25 Nov 2004, Per Liden wrote: > > [...] > > > I'm having problems with DRBD getting stuck at around 99-100% during an > > initial/full sync. This seems to be happening about 8 out of 10 times. > > After some further testing it seems that I managed to resolve the issue. > > Changes I made to my configuration: > - Removed LVM (DRBD now runs directly on top of my hda10 device). > - Changed meta-data to "internal" (instead of hda9 [0]). > - Filesystem used on top of /dev/drbd0 is now reiserfs instead of ext3. > (I thought I should mention is, even if the choice of filesystem > shouldn't have anything to do with my sync problem). > > So far I've done three full syncs without getting stuck. Unfortunately I > did all the above changes in one go, so I can't really say if it was LVM > or the separate meta-data partition that casued the problem. My guess is > LVM though. > > Whether I can live without LVM is something I'll have to look into... > > [...] > > > Interesting to note is that the nodes seem to have different ideas about > > how much data needs to be synchronized, i.e.: > > Nov 25 11:07:03 Proc1 kernel: drbd0: Resync started as SyncSource (need > > to sync 60812372 KB [15203093 bits set]). vs. > > Nov 25 11:07:04 Proc2 kernel: drbd0: Resync started as SyncTarget (need > > to sync 60558500 KB [15139625 bits set]). > > After my reconfiguration I haven't seen any thing like this again. Every > time a sync is initiated both nodes have a common understanding of the > number of bytes that need to be synchronized. I think that I have resoved the issue of Eugene Crosser by now, it should be solved with the patch applied to this e-mail. I am waiting for Eugene to confirm that the issue is solved now. I do not think it has anything to do with LVM or not LVM. It has to do with wether you have application IO during the _start_ of the resync process or not. I will write a longish exlanation of the bug and the fix to the list as soon as I have either the confirmation of Eugene or I found the time to reproduce it here in the office ( Today is some outbrak of some stupid windows worm, and we have to take care of the system's of our paying customers first... ) If you cound confirm this behavior (bug triggered by app IO during start of resync) and that p4 fixes it, this would help a lot... -philipp -- : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com : -------------- next part -------------- A non-text attachment was scrubbed... Name: p4 Type: text/x-diff Size: 2008 bytes Desc: not available URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20041126/74e70bef/attachment.diff>