Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Greetings, I just tried to update to DRBD 8.4.1 (after having issues with 8.3.7 : it would freeze reads while it was writing, see may other thread for more info "Read performance goes really low while writing."), and things looks much worse..... it just freeze to stall!, I can't background the cp process anymore, and I had to actually reboot the systems. Oh, and I got this on dmesg (before rebooting): [ 4645.904918] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 4645.951918] IP: [<ffffffff81439544>] clone_endio+0x34/0xe0 [ 4645.984835] PGD 32dbef067 PUD 32dbee067 PMD 0 [ 4646.011626] Oops: 0000 [#1] SMP [ 4646.031094] last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map [ 4646.078449] CPU 1 [ 4646.090596] Modules linked in: sha1_generic drbd crc32c libcrc32c ipmi_msghandler bridge stp ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bonding dm_crypt kvm_intel kvm psmouse serio_raw ioatdma shpchp lp parport raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 multipath linear ses enclosure radeon ttm drm_kms_helper drm usbhid i2c_algo_bit hid pata_jmicron igb floppy aacraid dca [ 4646.354667] Pid: 3517, comm: kdmflush Not tainted 2.6.32-38-server #83-Ubuntu X8DTN (....) [ 4646.903885] Process kdmflush (pid: 3517, threadinfo ffff880632676000, task ffff880631ea5c00) [ 4646.954348] Stack: [ 4646.966393] 0000000000015e00 ffff8806301b6800 ffff88062e0008c0 ffff880330f89bc0 [ 4647.009850] <0> 0000000000000000 ffff880330f89d40 ffff880632677b10 ffffffff81173d2d [ 4647.056060] <0> ffff880632677ba0 ffffffffa03a87fb ffff880631e94538 ffff88000c615e68 [ 4647.103463] Call Trace: [ 4647.118114] [<ffffffff81173d2d>] bio_endio+0x1d/0x40 [ 4647.148336] [<ffffffffa03a87fb>] drbd_make_request+0x34b/0x350 [drbd] [ 4647.187375] [<ffffffff812b6403>] ? cpumask_next_and+0x23/0x40 [ 4647.222266] [<ffffffff81056168>] ? find_busiest_group+0x688/0xb70 [ 4647.259232] [<ffffffff812a22a1>] generic_make_request+0x1b1/0x4f0 [ 4647.296201] [<ffffffff810f88e5>] ? mempool_alloc_slab+0x15/0x20 [ 4647.332128] [<ffffffff810f8a7d>] ? mempool_alloc+0x5d/0x130 [ 4647.365981] [<ffffffff81438fcd>] __map_bio+0xad/0x130 [ 4647.396717] [<ffffffff814394fd>] __clone_and_map+0x4ad/0x4c0 [ 4647.431090] [<ffffffff810f8a7d>] ? mempool_alloc+0x5d/0x130 [ 4647.464943] [<ffffffff8143a5d8>] __split_and_process_bio+0x108/0x190 [ 4647.503466] [<ffffffff8143a6b6>] dm_flush+0x56/0x70 [ 4647.533165] [<ffffffff8143a71c>] dm_wq_work+0x4c/0x1c0 [ 4647.564423] [<ffffffff8143a6d0>] ? dm_wq_work+0x0/0x1c0 [ 4647.596197] [<ffffffff81081597>] run_workqueue+0xc7/0x1a0 [ 4647.629015] [<ffffffff81081713>] worker_thread+0xa3/0x110 [ 4647.661826] [<ffffffff81086140>] ? autoremove_wake_function+0x0/0x40 [ 4647.700353] [<ffffffff81081670>] ? worker_thread+0x0/0x110 [ 4647.733685] [<ffffffff81085dc6>] kthread+0x96/0xa0 [ 4647.762862] [<ffffffff810141aa>] child_rip+0xa/0x20 (.....) I removed some lines because it was too long (I you need them, I could paste them somewhere). cat /proc/drbd : root at flashcode0:~/drbd# cat /proc/drbd version: 8.4.1 (api:1/proto:86-100) GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by root at flashcode0, 2012-02-01 21:24:54 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----- ns:512 nr:0 dw:1304 dr:10305 al:2 bm:2 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:1292 (yes, I disconnected the secondary to discard some issues there, and just started to copy a 50GB files to the DRBD volume, and got the error I already mentioned). This is Ubuntu 10.04 (Lucid). My current drbd config (without comments): global { usage-count yes; } common { handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; } startup { } options { } disk { resync-rate 20M; } net { protocol C; cram-hmac-alg sha1; shared-secret "super_shared_s3cret_here"; data-integrity-alg sha1; max-buffers 8000; max-epoch-size 8000; use-rle; csums-alg sha1; verify-alg sha1; timeout 150; ping-timeout 20; sndbuf-size 256k; } } resource test1 { device /dev/drbd_test1 minor 0; disk /dev/mapper/vg_server0-lv_drbd_test1; meta-disk internal; on server0 { address 192.168.55.1:7789; } on server1 { address 192.168.55.2:7789; } } Any ideas? I'll try to upgrade kernel to 3.1 series and test again, and if that fails, I'll try to go back to 8.3.x series of DRBD (latest 8.3.x). Thanks! Ildefonso.