Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I've got two nodes running Ubuntu 8.10/64bit using the included DRBD: version: 8.2.6 (api:88/proto:86-88) GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by phil at fat-tyre, 2008-05-30 12:59:17 Each node has 4 drives, which are striped together using LVM, and then cut into 24 logical volumes. One DRBD is attached to each of the 24 lvs. The two nodes speak over 2x bonded e1000s. All was running well for 40+ days with 24 KVM VMs running. I decided to try out the online verify functionality, and after adding "verify-alg crc32c;" to my config on both hosts, and running adjust, I ran: drbdadm verify VM24 All was well, and I watched "/proc/drbd" as the verify progressed. But then it stopped at 14%: 24: cs:VerifyS st:Primary/Secondary ds:UpToDate/UpToDate C r--- ns:2035752 nr:160958366 dw:162994118 dr:23997736 al:186 bm:10021 lo:0 pe:4624 ua:0 ap:16 oos:0 14% 5847476/39734074 The VM using that storage is hung hard. Stracing it shows it blocked in a rather uninformative spot: root at node1:~# strace -p 9878 Process 9878 attached - interrupt to quit futex(0xb531a0, 0x80 /* FUTEX_??? */, 2 Dmesg on the secondary node has nothing interesting, but dmesg on the primary node has: [3628294.472338] drbd24: state = { cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate r--- } [3628294.481196] drbd24: wanted = { cs:VerifyS st:Primary/Secondary ds:UpToDate/UpToDate r--- } [3628524.571022] drbd24: conn( Connected -> VerifyS ) [3628919.655921] drbd24: qemu-system-x86[10223] Concurrent local write detected! [DISCARD L] new: 952311s +3584; pending: 952311s +3584 [3628919.668048] drbd24: qemu-system-x86[10223] Concurrent local write detected! [DISCARD L] new: 952318s +512; pending: 952318s +512 [3628919.680433] drbd24: qemu-system-x86[10223] Concurrent local write detected! [DISCARD L] new: 799599s +3584; pending: 799599s +3584 [3628919.692566] drbd24: qemu-system-x86[10223] Concurrent local write detected! [DISCARD L] new: 799606s +512; pending: 799606s +512 [3629004.628073] drbd24: qemu-system-x86[10224] Concurrent local write detected! [DISCARD L] new: 952311s +3584; pending: 952311s +3584 [3629004.640192] drbd24: qemu-system-x86[10224] Concurrent local write detected! [DISCARD L] new: 952318s +512; pending: 952318s +512 [3629004.652675] drbd24: qemu-system-x86[10224] Concurrent local write detected! [DISCARD L] new: 799599s +3584; pending: 799599s +3584 [3629004.664787] drbd24: qemu-system-x86[10224] Concurrent local write detected! [DISCARD L] new: 799606s +512; pending: 799606s +512 Any ideas what could be causing this? Google on the "concurrent local write" error only turned up the check-in that added that code to DRBD. I can leave the system as it is for a few days, if there is more information I should collect. Please CC. Thanks. - nolan