[DRBD-user] drbd hanging on file write

Monty Taylor mtaylor at mysql.com
Thu Mar 16 20:04:41 CET 2006

Lars Ellenberg wrote:
>> I just set up a drbd replication pair and shipped it to the colo. It was working great.
> so you did local tests first, and all was working as expected?


> now. your report is somewhat unspecific.
> anyways, when I read your last sentence "machine is hung",
> this might point to a deadlock that could occur when stressing the box.

Fair enough. I think the hang was that block device was busy and 
couldn't be unmounted, and so when I tried to shutdown the machine it 
blocked waiting for the fs to unmount.

> this possible deadlock is due to a bio_alloc(,GFP_KERNEL) in drbd where
> is should have been GFP_NOIO, and has been recognized and fixed just
> after we released 0.7.17.
> may I ask you to try again with recent drbd svn? 
>  svn co http://svn.drbd.org/drbd/branches/drbd-0.7
> revision 2111 and greater should contain that fix.
> there may be a 0.7.18 bugfix release because of that.

I'll do that when I get the next set of test machines up. I downgraded 
the kernel back to 2.6.11 for now on these boxes and everything works as 
expected again.

> please report your findings.

I will when I get another test machine and 2.6.15 again, which should be 
next week. If there's a possible known deadlock, I bet that's what I ran 
into. On the other hand, is the default value for on-disconnect 
reconnect or freeze_io? Because if it's freeze_io I would maybe see that 
being what happened, too.

I'll keep trying to isolate for you. I think we're going to be using 
drbd a bit more as part of our Professional Services offerings to some 
clients, so it'll be nice to know where the problem actually sits.

Monty Taylor
Senior Consultant, MySQL, Inc.

