Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Fri, Jul 14, 2006 at 11:22:11PM +1000, Bradley Baetz wrote: > On Fri, Jul 14, 2006 at 10:46:27PM +1000, Bradley Baetz wrote: > > [please cc me on replies; I'm not subscribed to the list] I poked around a bit more, and discovered that drbd was starting before iptables. So it would start to sync, then iptables would come up and load conntrack and the tcp connection would be blocked by the firewall rules as INVALID. So presumably the problem is that something isn't detecting the connection going away and then trying to restart it. It is in some cases - I have some logs showing a start of a sync, then the connection being lost, and then a reconnect. The problem with the connection stalling still remains, though - if I disable the firewall on both boxes, and do |reboot -n -f| on the primary, on reboot it still stalls partway in to resyncing the AL. I do have a tcpdump of that from the secondary, but its over 70MB, and tcpdump said that about 1/3 of the packets were dropped, so I suspect that its not much use. Looking through the code, and comparing to RAID/LVM, I did notice one thing. The md code has: if (unlikely(bio_barrier(bio))) { bio_endio(bio, bio->bi_size, -EOPNOTSUPP); return 0; } in its make_request functions. For correctness, isn't something similar needed for drbd, at least until the TODO item with handling barriers is done? Bradley