[DRBD-user] drbd hanging on file write

Monty Taylor mtaylor at mysql.com
Thu Mar 16 20:04:41 CET 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Lars Ellenberg wrote:
>> I just set up a drbd replication pair and shipped it to the colo. It was working great.
> 
> so you did local tests first, and all was working as expected?

Yes.

> now. your report is somewhat unspecific.
> anyways, when I read your last sentence "machine is hung",
> this might point to a deadlock that could occur when stressing the box.

Fair enough. I think the hang was that block device was busy and 
couldn't be unmounted, and so when I tried to shutdown the machine it 
blocked waiting for the fs to unmount.

> this possible deadlock is due to a bio_alloc(,GFP_KERNEL) in drbd where
> is should have been GFP_NOIO, and has been recognized and fixed just
> after we released 0.7.17.
> 
> may I ask you to try again with recent drbd svn? 
>  svn co http://svn.drbd.org/drbd/branches/drbd-0.7
> revision 2111 and greater should contain that fix.
> there may be a 0.7.18 bugfix release because of that.

I'll do that when I get the next set of test machines up. I downgraded 
the kernel back to 2.6.11 for now on these boxes and everything works as 
expected again.

> please report your findings.

I will when I get another test machine and 2.6.15 again, which should be 
next week. If there's a possible known deadlock, I bet that's what I ran 
into. On the other hand, is the default value for on-disconnect 
reconnect or freeze_io? Because if it's freeze_io I would maybe see that 
being what happened, too.

I'll keep trying to isolate for you. I think we're going to be using 
drbd a bit more as part of our Professional Services offerings to some 
clients, so it'll be nice to know where the problem actually sits.

Thanks!
Monty Taylor
Senior Consultant, MySQL, Inc.



More information about the drbd-user mailing list