[DRBD-user] Sync stuck at 100%

Philipp Reisner philipp.reisner at linbit.com
Fri Nov 26 11:49:47 CET 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

On Friday 26 November 2004 11:22, Per Liden wrote:
> On Thu, 25 Nov 2004, Per Liden wrote:
> [...]
> > I'm having problems with DRBD getting stuck at around 99-100% during an
> > initial/full sync. This seems to be happening about 8 out of 10 times.
> After some further testing it seems that I managed to resolve the issue.
> Changes I made to my configuration:
> - Removed LVM (DRBD now runs directly on top of my hda10 device).
> - Changed meta-data to "internal" (instead of hda9 [0]).
> - Filesystem used on top of /dev/drbd0 is now reiserfs instead of ext3.
>   (I thought I should mention is, even if the choice of filesystem
>   shouldn't have anything to do with my sync problem).
> So far I've done three full syncs without getting stuck. Unfortunately I
> did all the above changes in one go, so I can't really say if it was LVM
> or the separate meta-data partition that casued the problem. My guess is
> LVM though.
> Whether I can live without LVM is something I'll have to look into...
> [...]
> > Interesting to note is that the nodes seem to have different ideas about
> > how much data needs to be synchronized, i.e.:
> >   Nov 25 11:07:03 Proc1 kernel: drbd0: Resync started as SyncSource (need
> > to sync 60812372 KB [15203093 bits set]). vs.
> >   Nov 25 11:07:04 Proc2 kernel: drbd0: Resync started as SyncTarget (need
> > to sync 60558500 KB [15139625 bits set]).
> After my reconfiguration I haven't seen any thing like this again. Every
> time a sync is initiated both nodes have a common understanding of the
> number of bytes that need to be synchronized.

I think that I have resoved the issue of Eugene Crosser by now, it 
should be solved with the patch applied to this e-mail. I am waiting
for Eugene to confirm that the issue is solved now.

I do not think it has anything to do with LVM or not LVM. It has
to do with wether you have application IO during the _start_ of
the resync process or not.

I will write a longish exlanation of the bug and the fix to the list
as soon as I have either the confirmation of Eugene or I found
the time to reproduce it here in the office 

( Today is some outbrak of some stupid windows worm, and we have
  to take care of the system's of our paying customers first... )

If you cound confirm this behavior (bug triggered by app IO during
start of resync) and that p4 fixes it, this would help a lot...

: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :
-------------- next part --------------
A non-text attachment was scrubbed...
Name: p4
Type: text/x-diff
Size: 2008 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20041126/74e70bef/attachment.diff>

More information about the drbd-user mailing list