[DRBD-user] Kernel error message with drbd 8.4.1

Jose Ildefonso Camargo Tolosa ildefonso.camargo at gmail.com
Sat Feb 4 04:53:21 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Thu, Feb 2, 2012 at 11:37 AM, Jose Ildefonso Camargo Tolosa
<ildefonso.camargo at gmail.com> wrote:
> Update: the problem seems to go away with kernel 3.0.  I still need to
> do more testing, but apparently the kernel update did the trick.
> After finishing the tests, I'll post to list again. Thanks!

Yes, kernel error message went away, also the "read-while-writing"
problem went away.  Now, I'm just having some write slowness (I
finally got my network to work at slightly higher speeds, ~200Mbps,
after changing MTU to 20000 bytes, tested with iperf, and scp)... now,
drbd won't write at any higher than 50Mbps (~5MB/s)... and no, disks
are not saturated, these write at around 100MB/s (tested on another
partition, same disk).

Any ideas?

Thanks!

Ildefonso.

>
>
> On Thu, Feb 2, 2012 at 8:39 AM, Jose Ildefonso Camargo Tolosa
> <ildefonso.camargo at gmail.com> wrote:
>> Greetings,
>>
>> I just tried to update to DRBD 8.4.1 (after having issues with 8.3.7 :
>> it would freeze reads while it was writing, see may other thread for
>> more info "Read performance goes really low while writing."), and
>> things looks much worse..... it just freeze to stall!,  I can't
>> background the cp process anymore, and I had to actually reboot the
>> systems.  Oh, and I got this on dmesg (before rebooting):
>>
>> [ 4645.904918] BUG: unable to handle kernel NULL pointer dereference
>> at 0000000000000008
>> [ 4645.951918] IP: [<ffffffff81439544>] clone_endio+0x34/0xe0
>> [ 4645.984835] PGD 32dbef067 PUD 32dbee067 PMD 0
>> [ 4646.011626] Oops: 0000 [#1] SMP
>> [ 4646.031094] last sysfs file:
>> /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
>> [ 4646.078449] CPU 1
>> [ 4646.090596] Modules linked in: sha1_generic drbd crc32c libcrc32c
>> ipmi_msghandler bridge stp ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad
>> ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
>> bonding dm_crypt kvm_intel kvm psmouse serio_raw ioatdma shpchp lp
>> parport raid10 raid456 async_pq async_xor xor async_memcpy
>> async_raid6_recov raid6_pq async_tx raid1 raid0 multipath linear ses
>> enclosure radeon ttm drm_kms_helper drm usbhid i2c_algo_bit hid
>> pata_jmicron igb floppy aacraid dca
>> [ 4646.354667] Pid: 3517, comm: kdmflush Not tainted 2.6.32-38-server
>> #83-Ubuntu X8DTN
>> (....)
>> [ 4646.903885] Process kdmflush (pid: 3517, threadinfo
>> ffff880632676000, task ffff880631ea5c00)
>> [ 4646.954348] Stack:
>> [ 4646.966393]  0000000000015e00 ffff8806301b6800 ffff88062e0008c0
>> ffff880330f89bc0
>> [ 4647.009850] <0> 0000000000000000 ffff880330f89d40 ffff880632677b10
>> ffffffff81173d2d
>> [ 4647.056060] <0> ffff880632677ba0 ffffffffa03a87fb ffff880631e94538
>> ffff88000c615e68
>> [ 4647.103463] Call Trace:
>> [ 4647.118114]  [<ffffffff81173d2d>] bio_endio+0x1d/0x40
>> [ 4647.148336]  [<ffffffffa03a87fb>] drbd_make_request+0x34b/0x350 [drbd]
>> [ 4647.187375]  [<ffffffff812b6403>] ? cpumask_next_and+0x23/0x40
>> [ 4647.222266]  [<ffffffff81056168>] ? find_busiest_group+0x688/0xb70
>> [ 4647.259232]  [<ffffffff812a22a1>] generic_make_request+0x1b1/0x4f0
>> [ 4647.296201]  [<ffffffff810f88e5>] ? mempool_alloc_slab+0x15/0x20
>> [ 4647.332128]  [<ffffffff810f8a7d>] ? mempool_alloc+0x5d/0x130
>> [ 4647.365981]  [<ffffffff81438fcd>] __map_bio+0xad/0x130
>> [ 4647.396717]  [<ffffffff814394fd>] __clone_and_map+0x4ad/0x4c0
>> [ 4647.431090]  [<ffffffff810f8a7d>] ? mempool_alloc+0x5d/0x130
>> [ 4647.464943]  [<ffffffff8143a5d8>] __split_and_process_bio+0x108/0x190
>> [ 4647.503466]  [<ffffffff8143a6b6>] dm_flush+0x56/0x70
>> [ 4647.533165]  [<ffffffff8143a71c>] dm_wq_work+0x4c/0x1c0
>> [ 4647.564423]  [<ffffffff8143a6d0>] ? dm_wq_work+0x0/0x1c0
>> [ 4647.596197]  [<ffffffff81081597>] run_workqueue+0xc7/0x1a0
>> [ 4647.629015]  [<ffffffff81081713>] worker_thread+0xa3/0x110
>> [ 4647.661826]  [<ffffffff81086140>] ? autoremove_wake_function+0x0/0x40
>> [ 4647.700353]  [<ffffffff81081670>] ? worker_thread+0x0/0x110
>> [ 4647.733685]  [<ffffffff81085dc6>] kthread+0x96/0xa0
>> [ 4647.762862]  [<ffffffff810141aa>] child_rip+0xa/0x20
>> (.....)
>>
>> I removed some lines because it was too long (I you need them, I could
>> paste them somewhere).
>>
>> cat /proc/drbd :
>>
>> root at flashcode0:~/drbd# cat /proc/drbd
>> version: 8.4.1 (api:1/proto:86-100)
>> GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by
>> root at flashcode0, 2012-02-01 21:24:54
>>  0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>>    ns:512 nr:0 dw:1304 dr:10305 al:2 bm:2 lo:0 pe:0 ua:0 ap:0 ep:1
>> wo:b oos:1292
>>
>> (yes, I disconnected the secondary to discard some issues there, and
>> just started to copy a 50GB files to the DRBD volume, and got the
>> error I already mentioned).
>>
>> This is Ubuntu 10.04 (Lucid).
>>
>> My current drbd config (without comments):
>>
>> global {
>>        usage-count yes;
>> }
>>
>> common {
>>        handlers {
>>                pri-on-incon-degr
>> "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
>> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger
>> ; reboot -f";
>>                pri-lost-after-sb
>> "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
>> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger
>> ; reboot -f";
>>                local-io-error "/usr/lib/drbd/notify-io-error.sh;
>> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o >
>> /proc/sysrq-trigger ; halt -f";
>>        }
>>
>>        startup {
>>        }
>>
>>        options {
>>        }
>>
>>        disk {
>>                resync-rate 20M;
>>        }
>>
>>        net {
>>                protocol C;
>>                cram-hmac-alg sha1;
>>                shared-secret "super_shared_s3cret_here";
>>                data-integrity-alg sha1;
>>                max-buffers 8000;
>>                max-epoch-size 8000;
>>                use-rle;
>>                csums-alg sha1;
>>                verify-alg sha1;
>>                timeout 150;
>>                ping-timeout 20;
>>                sndbuf-size 256k;
>>        }
>> }
>>
>> resource test1 {
>>        device  /dev/drbd_test1 minor 0;
>>        disk    /dev/mapper/vg_server0-lv_drbd_test1;
>>        meta-disk internal;
>>        on server0 {
>>                address 192.168.55.1:7789;
>>        }
>>        on server1 {
>>                address 192.168.55.2:7789;
>>        }
>> }
>>
>> Any ideas? I'll try to upgrade kernel to 3.1 series and test again,
>> and if that fails, I'll try to go back to 8.3.x series of DRBD (latest
>> 8.3.x).
>>
>> Thanks!
>>
>> Ildefonso.



More information about the drbd-user mailing list