[DRBD-user] Xen Remus DRBD dual primary frozen

agya naila agya.naila at gmail.com
Mon Apr 1 22:00:23 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Dear all,

I have sent this problem earlier but maybe its not detail, here I try to
write more detail. I hope anybody can help me to point out the problem.
First of all used virtualization, I used Ubuntu 12.04 x64 both for domain0
and domainU with modification to run under xen hypervisor and work with
remus.
I follow and configured the remus with this notes
http://wiki.xen.org/wiki/Install_Xen_4.1.4_with_Remus_and_DRBD_on_Ubuntu_12.10but
I used xen 4.2.2 as my hypervisor with DRBD 3.8.11 remus support from
this link
http://remusha.wikidot.com/local--files/configuring-and-installing-remus/drbd-8.3.11-remus.tar.gz
.

If DRBD run with Primary - secondary mode, there is no problem. However
remus run with dual primary mode. If I try to run remus the drbd will
freeze and cause my domainU to freeze. With dmesg error message is below :

[242525.600067] block drbd1: Local backing block device frozen?
[242537.632070] block drbd1: Local backing block device frozen?
[242549.664075] block drbd1: Local backing block device frozen?
[242561.696083] block drbd1: Local backing block device frozen?
[242573.728079] block drbd1: Local backing block device frozen?
[242585.760069] block drbd1: Local backing block device frozen?
[242597.792079] block drbd1: Local backing block device frozen?
[242609.824069] block drbd1: Local backing block device frozen?
[242621.856083] block drbd1: Local backing block device frozen?
[242633.888068] block drbd1: Local backing block device frozen?
[242640.332124] INFO: task blkback.2.xvda:5779 blocked for more than 120
seconds.
[242640.332130] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[242640.332134] blkback.2.xvda  D ffff88003fc13780     0  5779      2
0x00000000
[242640.332142]  ffff880026743940 0000000000000246 000000000000000b
ffff8800267402d0
[242640.332150]  ffff880026743fd8 ffff880026743fd8 ffff880026743fd8
0000000000013780
[242640.332157]  ffff880032944500 ffff88003368c500 ffff8800357d6000
ffff8800357d69d8
[242640.332164] Call Trace:
[242640.332178]  [<ffffffff816579cf>] schedule+0x3f/0x60
[242640.332200]  [<ffffffffa00e68d5>] drbd_al_begin_io+0x205/0x270 [drbd]
[242640.332207]  [<ffffffff811adde8>] ? bvec_alloc_bs+0x68/0x100
[242640.332212]  [<ffffffff811adf32>] ? bio_alloc_bioset+0xb2/0xf0
[242640.332219]  [<ffffffff8108aa50>] ? add_wait_queue+0x60/0x60
[242640.332231]  [<ffffffffa00e41bd>] drbd_make_request_common+0xc4d/0x1430
[drbd]
[242640.332239]  [<ffffffffa01b83ce>] ? xen_blkbk_map+0x24e/0x2f0
[xen_blkback]
[242640.332245]  [<ffffffff81301006>] ? throtl_find_tg+0x46/0x60
[242640.332257]  [<ffffffffa00e4e04>] drbd_make_request+0x464/0x7e0 [drbd]
[242640.332264]  [<ffffffff812f03bb>] ?
generic_make_request_checks+0x1eb/0x370
[242640.332269]  [<ffffffff812f0194>] generic_make_request.part.50+0x74/0xb0
[242640.332274]  [<ffffffff812f05a8>] generic_make_request+0x68/0x70
[242640.332278]  [<ffffffff812f0635>] submit_bio+0x85/0x110
[242640.332284]  [<ffffffffa01b8f0f>] dispatch_rw_block_io+0x44f/0x700
[xen_blkback]
[242640.332292]  [<ffffffff8100330e>] ? xen_end_context_switch+0x1e/0x30
[242640.332298]  [<ffffffffa01b93df>] __do_block_io_op+0x21f/0x360
[xen_blkback]
[242640.332304]  [<ffffffffa01b9608>] xen_blkif_schedule+0xb8/0x320
[xen_blkback]
[242640.332309]  [<ffffffff8108aa50>] ? add_wait_queue+0x60/0x60
[242640.332314]  [<ffffffffa01b9550>] ? xen_blkif_be_int+0x30/0x30
[xen_blkback]
[242640.332319]  [<ffffffff81089fbc>] kthread+0x8c/0xa0
[242640.332326]  [<ffffffff81664034>] kernel_thread_helper+0x4/0x10
[242640.332330]  [<ffffffff816620e3>] ? int_ret_from_sys_call+0x7/0x1b
[242640.332336]  [<ffffffff81659dbc>] ? retint_restore_args+0x5/0x6
[242640.332340]  [<ffffffff81664030>] ? gs_change+0x13/0x13
[242645.920070] block drbd1: Local backing block device frozen?
[242657.952074] block drbd1: Local backing block device frozen?
[242669.984072] block drbd1: Local backing block device frozen?
[242682.016071] block drbd1: Local backing block device frozen?
[242694.048071] block drbd1: Local backing block device frozen?
[242706.080071] block drbd1: Local backing block device frozen?
[242718.112077] block drbd1: Local backing block device frozen?
sb-voip2 at sbvoip2:~$ sudo cat /proc/drbd
version: 8.3.11 (api:88/proto:86-96)
GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by root at sbvoip2,
2013-02-19 08:30:51

 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate D r-----
    ns:14732 nr:1784712 dw:1799444 dr:579340 al:31 bm:44 lo:1 pe:0 ua:0
ap:1 ep:1 wo:b def:0 chkpt:662 oos:0

As we can read after drbd block device frozen then blkback also not working

[242640.332124] INFO: task blkback.2.xvda:5779 blocked for more than 120
seconds.

Some one told me its because high load of IO but I alwasy monitor my server
with xm top and the serer load always under 50%
I hope anybody can help me, if you need some more log I will try to post it.

 However I found this patch
http://permalink.gmane.org/gmane.linux.kernel.commits.head/358143, but I am
not sure it could be applied with my DRBD version since I can't find
drivers/block/drbd/drbd_state.c
within my installation

Many thanks,

Agya
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130401/2b1da928/attachment.htm>


More information about the drbd-user mailing list