Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
We've seen similar issues on Debian Squeeze (2.6.32-5-xen-amd64) with DRBD 8.4.x which has led to us rolling back to the in kernel version (8.3.7), although the oops for me was generated when starting a Xen domU which had used a DRBD device. The domU would boot but then shortly after bootup IO would hang. An oops would be seen on the console of the dom0, unfortunately I never captured the message though. On 4 February 2012 03:53, Jose Ildefonso Camargo Tolosa <ildefonso.camargo at gmail.com> wrote: > On Thu, Feb 2, 2012 at 11:37 AM, Jose Ildefonso Camargo Tolosa > <ildefonso.camargo at gmail.com> wrote: >> Update: the problem seems to go away with kernel 3.0. I still need to >> do more testing, but apparently the kernel update did the trick. >> After finishing the tests, I'll post to list again. Thanks! > > Yes, kernel error message went away, also the "read-while-writing" > problem went away. Now, I'm just having some write slowness (I > finally got my network to work at slightly higher speeds, ~200Mbps, > after changing MTU to 20000 bytes, tested with iperf, and scp)... now, > drbd won't write at any higher than 50Mbps (~5MB/s)... and no, disks > are not saturated, these write at around 100MB/s (tested on another > partition, same disk). > > Any ideas? > > Thanks! > > Ildefonso. > >> >> >> On Thu, Feb 2, 2012 at 8:39 AM, Jose Ildefonso Camargo Tolosa >> <ildefonso.camargo at gmail.com> wrote: >>> Greetings, >>> >>> I just tried to update to DRBD 8.4.1 (after having issues with 8.3.7 : >>> it would freeze reads while it was writing, see may other thread for >>> more info "Read performance goes really low while writing."), and >>> things looks much worse..... it just freeze to stall!, I can't >>> background the cp process anymore, and I had to actually reboot the >>> systems. Oh, and I got this on dmesg (before rebooting): >>> >>> [ 4645.904918] BUG: unable to handle kernel NULL pointer dereference >>> at 0000000000000008 >>> [ 4645.951918] IP: [<ffffffff81439544>] clone_endio+0x34/0xe0 >>> [ 4645.984835] PGD 32dbef067 PUD 32dbee067 PMD 0 >>> [ 4646.011626] Oops: 0000 [#1] SMP >>> [ 4646.031094] last sysfs file: >>> /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map >>> [ 4646.078449] CPU 1 >>> [ 4646.090596] Modules linked in: sha1_generic drbd crc32c libcrc32c >>> ipmi_msghandler bridge stp ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad >>> ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi >>> bonding dm_crypt kvm_intel kvm psmouse serio_raw ioatdma shpchp lp >>> parport raid10 raid456 async_pq async_xor xor async_memcpy >>> async_raid6_recov raid6_pq async_tx raid1 raid0 multipath linear ses >>> enclosure radeon ttm drm_kms_helper drm usbhid i2c_algo_bit hid >>> pata_jmicron igb floppy aacraid dca >>> [ 4646.354667] Pid: 3517, comm: kdmflush Not tainted 2.6.32-38-server >>> #83-Ubuntu X8DTN >>> (....) >>> [ 4646.903885] Process kdmflush (pid: 3517, threadinfo >>> ffff880632676000, task ffff880631ea5c00) >>> [ 4646.954348] Stack: >>> [ 4646.966393] 0000000000015e00 ffff8806301b6800 ffff88062e0008c0 >>> ffff880330f89bc0 >>> [ 4647.009850] <0> 0000000000000000 ffff880330f89d40 ffff880632677b10 >>> ffffffff81173d2d >>> [ 4647.056060] <0> ffff880632677ba0 ffffffffa03a87fb ffff880631e94538 >>> ffff88000c615e68 >>> [ 4647.103463] Call Trace: >>> [ 4647.118114] [<ffffffff81173d2d>] bio_endio+0x1d/0x40 >>> [ 4647.148336] [<ffffffffa03a87fb>] drbd_make_request+0x34b/0x350 [drbd] >>> [ 4647.187375] [<ffffffff812b6403>] ? cpumask_next_and+0x23/0x40 >>> [ 4647.222266] [<ffffffff81056168>] ? find_busiest_group+0x688/0xb70 >>> [ 4647.259232] [<ffffffff812a22a1>] generic_make_request+0x1b1/0x4f0 >>> [ 4647.296201] [<ffffffff810f88e5>] ? mempool_alloc_slab+0x15/0x20 >>> [ 4647.332128] [<ffffffff810f8a7d>] ? mempool_alloc+0x5d/0x130 >>> [ 4647.365981] [<ffffffff81438fcd>] __map_bio+0xad/0x130 >>> [ 4647.396717] [<ffffffff814394fd>] __clone_and_map+0x4ad/0x4c0 >>> [ 4647.431090] [<ffffffff810f8a7d>] ? mempool_alloc+0x5d/0x130 >>> [ 4647.464943] [<ffffffff8143a5d8>] __split_and_process_bio+0x108/0x190 >>> [ 4647.503466] [<ffffffff8143a6b6>] dm_flush+0x56/0x70 >>> [ 4647.533165] [<ffffffff8143a71c>] dm_wq_work+0x4c/0x1c0 >>> [ 4647.564423] [<ffffffff8143a6d0>] ? dm_wq_work+0x0/0x1c0 >>> [ 4647.596197] [<ffffffff81081597>] run_workqueue+0xc7/0x1a0 >>> [ 4647.629015] [<ffffffff81081713>] worker_thread+0xa3/0x110 >>> [ 4647.661826] [<ffffffff81086140>] ? autoremove_wake_function+0x0/0x40 >>> [ 4647.700353] [<ffffffff81081670>] ? worker_thread+0x0/0x110 >>> [ 4647.733685] [<ffffffff81085dc6>] kthread+0x96/0xa0 >>> [ 4647.762862] [<ffffffff810141aa>] child_rip+0xa/0x20 >>> (.....) >>> >>> I removed some lines because it was too long (I you need them, I could >>> paste them somewhere). >>> >>> cat /proc/drbd : >>> >>> root at flashcode0:~/drbd# cat /proc/drbd >>> version: 8.4.1 (api:1/proto:86-100) >>> GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by >>> root at flashcode0, 2012-02-01 21:24:54 >>> 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----- >>> ns:512 nr:0 dw:1304 dr:10305 al:2 bm:2 lo:0 pe:0 ua:0 ap:0 ep:1 >>> wo:b oos:1292 >>> >>> (yes, I disconnected the secondary to discard some issues there, and >>> just started to copy a 50GB files to the DRBD volume, and got the >>> error I already mentioned). >>> >>> This is Ubuntu 10.04 (Lucid). >>> >>> My current drbd config (without comments): >>> >>> global { >>> usage-count yes; >>> } >>> >>> common { >>> handlers { >>> pri-on-incon-degr >>> "/usr/lib/drbd/notify-pri-on-incon-degr.sh; >>> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger >>> ; reboot -f"; >>> pri-lost-after-sb >>> "/usr/lib/drbd/notify-pri-lost-after-sb.sh; >>> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger >>> ; reboot -f"; >>> local-io-error "/usr/lib/drbd/notify-io-error.sh; >>> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > >>> /proc/sysrq-trigger ; halt -f"; >>> } >>> >>> startup { >>> } >>> >>> options { >>> } >>> >>> disk { >>> resync-rate 20M; >>> } >>> >>> net { >>> protocol C; >>> cram-hmac-alg sha1; >>> shared-secret "super_shared_s3cret_here"; >>> data-integrity-alg sha1; >>> max-buffers 8000; >>> max-epoch-size 8000; >>> use-rle; >>> csums-alg sha1; >>> verify-alg sha1; >>> timeout 150; >>> ping-timeout 20; >>> sndbuf-size 256k; >>> } >>> } >>> >>> resource test1 { >>> device /dev/drbd_test1 minor 0; >>> disk /dev/mapper/vg_server0-lv_drbd_test1; >>> meta-disk internal; >>> on server0 { >>> address 192.168.55.1:7789; >>> } >>> on server1 { >>> address 192.168.55.2:7789; >>> } >>> } >>> >>> Any ideas? I'll try to upgrade kernel to 3.1 series and test again, >>> and if that fails, I'll try to go back to 8.3.x series of DRBD (latest >>> 8.3.x). >>> >>> Thanks! >>> >>> Ildefonso. > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user -- Adam Wilbraham Senior Systems Administrator Technophobia Ltd, Velocity House, 3 Solly Street, Sheffield, S1 4DE t: +44 (0)114 2212123 e: adam.wilbraham at technophobia.com w: http://www.technophobia.com/ http://twitter.com/WeTechnophobia Part of the Capita Group: www.capita.co.uk Registered in England and Wales Company No. 3063669 VAT registration No. 618 1841 40 ISO 9001:2000 Accredited Company No. 21227 ISO 14001:2004 Accredited Company No. E997 ISO 27001:2005 (BS7799) Accredited Company No. IS 508906 Investor in People Certified No. 101507 The contents of this email are confidential to the addressee and are intended solely for the recipients use. If you are not the addressee, you have received this email in error. Any disclosure, copying, distribution or action taken in reliance on it is prohibited and may be unlawful. Any opinions expressed in this email are those of the author personally and not Technophobia Limited who do not accept responsibility for the contents of the message. All email communications, in and out of Technophobia, are recorded for monitoring purposes.