[DRBD-user] drbd freeze its filesystem

Lars Ellenberg Lars.Ellenberg at linbit.com
Thu Oct 12 16:29:46 CEST 2006

/ 2006-10-12 15:15:36 +0200
\ Enrico Morelli:
> >which file system?
> >if reiserfs, please try something else.
> >we had a report about "aparent freezes" with reiserfs on top of drbd,
> >but the "freeze" (it recovers after minutes to hours "all by itself")
> >was in reiserfs, not in drbd. we just change the timing behaviour of the
> >io stack...
> >
> ArghHHhh!!! Yes, I have the reiserfs on top of drbd and the machines
> are servers in production. So i cannot change the filesystem.

to "migrate" the file system, you could
degrade the cluster, mkfs.xfs on the not-active node,
rsync the data over. the first rsync will take ages,
and you will need to repeat it, but the less is changed,
the less it needs to transfer, the less can be changed.
then you have one short downtime window where you decide that
now you go down with server one, remount ro, do a final rsync,
and go active with the other node.
then you let drbd sync back the new xfs.

no guarantee for nothing here, your problem may still be something else
completely, and this may make it even worse...

> The is some workaround to avoid this problem?
> I didn't find warning about the problem reiserfs/drbd, maybe should be
> useful to write about this on the site or on the wiki.

it is not "reiserfs and drbd leads to trouble".
there are clusters with reiserfs on top of drbd out there,
performing just fine.

it is "we had _one_ report", where the original poster figured out that
some processes on reiserfs on top of drbd got stuck in getdents64 for
ages, and that using a different file system made the problem go away
for him.

if you google for hang and reiserfs and getdents64 you get a few hits,
but most of them are pretty old.

in 2.6.16-rc.something, there was a
included, maybe they just got it "half-right" ?

in some (very old) posts about reiserfs hang in getdents64, I found
the recommendation to "just touch some random directory" within that
file system, to "wake it up".

so as a workaround for your situation, maybe do some
 # sync
on the drbd primary and/or secondary, or do some
 # mkdir /mnt/reiserfs-mount-point/dummy$$
 # rmdir /mnt/reiserfs-mount-point/dummy$$

may or may not help...

