[DRBD-user] Re: drbd 0.7.20 lockup

Sat Jul 15 07:54:33 CEST 2006

On Fri, Jul 14, 2006 at 11:22:11PM +1000, Bradley Baetz wrote:
> On Fri, Jul 14, 2006 at 10:46:27PM +1000, Bradley Baetz wrote:
> > [please cc me on replies; I'm not subscribed to the list]

I poked around a bit more, and discovered that drbd was starting before
iptables. So it would start to sync, then iptables would come up and
load conntrack and the tcp connection would be blocked by the firewall
rules as INVALID. So presumably the problem is that something isn't
detecting the connection going away and then trying to restart it. It is
in some cases - I have some logs showing a start of a sync, then the
connection being lost, and then a reconnect.

The problem with the connection stalling still remains, though - if I
disable the firewall on both boxes, and do |reboot -n -f| on the
primary, on reboot it still stalls partway in to resyncing the AL.

I do have a tcpdump of that from the secondary, but its over 70MB, and
tcpdump said that about 1/3 of the packets were dropped, so I suspect that
its not much use.

Looking through the code, and comparing to RAID/LVM, I did notice one thing.
The md code has:

        if (unlikely(bio_barrier(bio))) {
                bio_endio(bio, bio->bi_size, -EOPNOTSUPP);
                return 0;
        }

in its make_request functions. For correctness, isn't something similar
needed for drbd, at least until the TODO item with handling barriers is
done?

Bradley