Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Sat, Dec 13, 2008 at 12:57:59PM -0800, Nolan wrote: > On Fri, 2008-12-13 at 17:11 +0100, Lars Ellenberg wrote: > > appart from running many processes doing direct io to the same > > block, there is not much I can think of that may produce these > > concurrent writes. > > I am doing direct IO (via kvm's cache=off option). > > There is only the one process, but I believe it simulates AIO using > glibc's thread-based AIO implementation. > > Since the guest (debian etch) is using SCSI TCQ, it could in theory > write the same block many times. No idea why it would do that though. > > I've also no idea why running a verify would trigger it, if that is > something more than mere coincidence. > >either "coincidence", or because of the added IO load (read the whole >disk and checksum it) changes the timing. this has nothing to do with >the "hanging" drbd resource, though. > > > can you do the ps -eo | grep magic on the other node as well please? > > # ps -eo pid,state,wchan:30,cmd | grep -e drbd -e D > PID S WCHAN CMD > 3002 S select /usr/local/bin/qemu-system-x86_64 > -m 512 -drive file=/dev/drbd23,if=scsi,cache=off,boot=on -drive > if=ide,index=2,media=cdrom -usbdevice tablet -name root_vm_0 -net > nic,macaddr=xx:xx:xx:xx:xx:xx,model=virtio -net tap -monitor > unix:/tmp/VM_root_vm_0,server,nowait -tdf -daemonize -vnc :0,password > 5261 S drbd_wait_peer_seq [drbd24_receiver] > ^^^^^^^^^^^^^^^^^^ > >there. >that is an interessting hint. > >this has been fixed in 8.2.7. > >as a workaround, you can use e.g. iptables to disconnect/reconnect, >but chances are that an online verify will again get stuck in your >setup. > >just make sure that you add the drbd-8.2.7 hotfix for the online verify >as well, so either use the drbd 8.2.7 tarball >plus this patch: >http://git.drbd.org/drbd-8.2.git/?p=drbd-8.2.git;a=commitdiff;h=1174410#patch1 > >or even better, use drbd-8.2 HEAD, there: >http://git.drbd.org/drbd-8.2.git/ > >we are confident that we will release a fine 8.3.0 this week, >which supersedes the 8.2 series. > Sorry to add a me too reply to this thread, but we had the exact same thing happen this weekend, except out system hung at 54% and we use 8.2.7. I was able to bring the guest back, by putting both nodes in secondary and then the primary back to primary. Are you confident this is fixed in 8.3? I can provide any information you may need, but our setup is the same except for xen hypervisor, guests are on lvm/drbd. One message from the guest that was hung: Dec 15 06:30:56 xen01 kernel: drbd1: [drbd1_worker/14794] sock_sendmsg time expired, ko = 4294967295 Thanks for your time.