[DRBD-user] Kernel error message with drbd 8.4.1

Jose Ildefonso Camargo Tolosa ildefonso.camargo at gmail.com
Thu Feb 2 14:09:40 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Greetings,

I just tried to update to DRBD 8.4.1 (after having issues with 8.3.7 :
it would freeze reads while it was writing, see may other thread for
more info "Read performance goes really low while writing."), and
things looks much worse..... it just freeze to stall!,  I can't
background the cp process anymore, and I had to actually reboot the
systems.  Oh, and I got this on dmesg (before rebooting):

[ 4645.904918] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000008
[ 4645.951918] IP: [<ffffffff81439544>] clone_endio+0x34/0xe0
[ 4645.984835] PGD 32dbef067 PUD 32dbee067 PMD 0
[ 4646.011626] Oops: 0000 [#1] SMP
[ 4646.031094] last sysfs file:
/sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
[ 4646.078449] CPU 1
[ 4646.090596] Modules linked in: sha1_generic drbd crc32c libcrc32c
ipmi_msghandler bridge stp ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad
ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
bonding dm_crypt kvm_intel kvm psmouse serio_raw ioatdma shpchp lp
parport raid10 raid456 async_pq async_xor xor async_memcpy
async_raid6_recov raid6_pq async_tx raid1 raid0 multipath linear ses
enclosure radeon ttm drm_kms_helper drm usbhid i2c_algo_bit hid
pata_jmicron igb floppy aacraid dca
[ 4646.354667] Pid: 3517, comm: kdmflush Not tainted 2.6.32-38-server
#83-Ubuntu X8DTN
(....)
[ 4646.903885] Process kdmflush (pid: 3517, threadinfo
ffff880632676000, task ffff880631ea5c00)
[ 4646.954348] Stack:
[ 4646.966393]  0000000000015e00 ffff8806301b6800 ffff88062e0008c0
ffff880330f89bc0
[ 4647.009850] <0> 0000000000000000 ffff880330f89d40 ffff880632677b10
ffffffff81173d2d
[ 4647.056060] <0> ffff880632677ba0 ffffffffa03a87fb ffff880631e94538
ffff88000c615e68
[ 4647.103463] Call Trace:
[ 4647.118114]  [<ffffffff81173d2d>] bio_endio+0x1d/0x40
[ 4647.148336]  [<ffffffffa03a87fb>] drbd_make_request+0x34b/0x350 [drbd]
[ 4647.187375]  [<ffffffff812b6403>] ? cpumask_next_and+0x23/0x40
[ 4647.222266]  [<ffffffff81056168>] ? find_busiest_group+0x688/0xb70
[ 4647.259232]  [<ffffffff812a22a1>] generic_make_request+0x1b1/0x4f0
[ 4647.296201]  [<ffffffff810f88e5>] ? mempool_alloc_slab+0x15/0x20
[ 4647.332128]  [<ffffffff810f8a7d>] ? mempool_alloc+0x5d/0x130
[ 4647.365981]  [<ffffffff81438fcd>] __map_bio+0xad/0x130
[ 4647.396717]  [<ffffffff814394fd>] __clone_and_map+0x4ad/0x4c0
[ 4647.431090]  [<ffffffff810f8a7d>] ? mempool_alloc+0x5d/0x130
[ 4647.464943]  [<ffffffff8143a5d8>] __split_and_process_bio+0x108/0x190
[ 4647.503466]  [<ffffffff8143a6b6>] dm_flush+0x56/0x70
[ 4647.533165]  [<ffffffff8143a71c>] dm_wq_work+0x4c/0x1c0
[ 4647.564423]  [<ffffffff8143a6d0>] ? dm_wq_work+0x0/0x1c0
[ 4647.596197]  [<ffffffff81081597>] run_workqueue+0xc7/0x1a0
[ 4647.629015]  [<ffffffff81081713>] worker_thread+0xa3/0x110
[ 4647.661826]  [<ffffffff81086140>] ? autoremove_wake_function+0x0/0x40
[ 4647.700353]  [<ffffffff81081670>] ? worker_thread+0x0/0x110
[ 4647.733685]  [<ffffffff81085dc6>] kthread+0x96/0xa0
[ 4647.762862]  [<ffffffff810141aa>] child_rip+0xa/0x20
(.....)

I removed some lines because it was too long (I you need them, I could
paste them somewhere).

cat /proc/drbd :

root at flashcode0:~/drbd# cat /proc/drbd
version: 8.4.1 (api:1/proto:86-100)
GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by
root at flashcode0, 2012-02-01 21:24:54
 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
    ns:512 nr:0 dw:1304 dr:10305 al:2 bm:2 lo:0 pe:0 ua:0 ap:0 ep:1
wo:b oos:1292

(yes, I disconnected the secondary to discard some issues there, and
just started to copy a 50GB files to the DRBD volume, and got the
error I already mentioned).

This is Ubuntu 10.04 (Lucid).

My current drbd config (without comments):

global {
        usage-count yes;
}

common {
        handlers {
                pri-on-incon-degr
"/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger
; reboot -f";
                pri-lost-after-sb
"/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger
; reboot -f";
                local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o >
/proc/sysrq-trigger ; halt -f";
        }

        startup {
        }

        options {
        }

        disk {
                resync-rate 20M;
        }

        net {
                protocol C;
                cram-hmac-alg sha1;
                shared-secret "super_shared_s3cret_here";
                data-integrity-alg sha1;
                max-buffers 8000;
                max-epoch-size 8000;
                use-rle;
                csums-alg sha1;
                verify-alg sha1;
                timeout 150;
                ping-timeout 20;
                sndbuf-size 256k;
        }
}

resource test1 {
        device  /dev/drbd_test1 minor 0;
        disk    /dev/mapper/vg_server0-lv_drbd_test1;
        meta-disk internal;
        on server0 {
                address 192.168.55.1:7789;
        }
        on server1 {
                address 192.168.55.2:7789;
        }
}

Any ideas? I'll try to upgrade kernel to 3.1 series and test again,
and if that fails, I'll try to go back to 8.3.x series of DRBD (latest
8.3.x).

Thanks!

Ildefonso.



More information about the drbd-user mailing list