[DRBD-user] "drbdadm verify" hung after 14%.

Nolan nolan at sigbus.net
Fri Dec 12 02:56:36 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

I've got two nodes running Ubuntu 8.10/64bit using the included DRBD:
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by
phil at fat-tyre, 2008-05-30 12:59:17

Each node has 4 drives, which are striped together using LVM, and then
cut into 24 logical volumes.  One DRBD is attached to each of the 24
lvs.  The two nodes speak over 2x bonded e1000s.  All was running well
for 40+ days with 24 KVM VMs running.

I decided to try out the online verify functionality, and after adding
"verify-alg crc32c;" to my config on both hosts, and running adjust, I
ran:
drbdadm verify VM24

All was well, and I watched "/proc/drbd" as the verify progressed.  But
then it stopped at 14%:
24: cs:VerifyS st:Primary/Secondary ds:UpToDate/UpToDate C r---
    ns:2035752 nr:160958366 dw:162994118 dr:23997736 al:186 bm:10021
lo:0 pe:4624 ua:0 ap:16 oos:0
         14%      5847476/39734074

The VM using that storage is hung hard.  Stracing it shows it blocked in
a rather uninformative spot:
root at node1:~# strace -p 9878
Process 9878 attached - interrupt to quit
futex(0xb531a0, 0x80 /* FUTEX_??? */, 2

Dmesg on the secondary node has nothing interesting, but dmesg on the
primary node has:
[3628294.472338] drbd24:   state = { cs:Connected st:Primary/Secondary
ds:UpToDate/UpToDate r--- }
[3628294.481196] drbd24:  wanted = { cs:VerifyS st:Primary/Secondary
ds:UpToDate/UpToDate r--- }
[3628524.571022] drbd24: conn( Connected -> VerifyS ) 
[3628919.655921] drbd24: qemu-system-x86[10223] Concurrent local write
detected! [DISCARD L] new: 952311s +3584; pending: 952311s +3584
[3628919.668048] drbd24: qemu-system-x86[10223] Concurrent local write
detected! [DISCARD L] new: 952318s +512; pending: 952318s +512
[3628919.680433] drbd24: qemu-system-x86[10223] Concurrent local write
detected! [DISCARD L] new: 799599s +3584; pending: 799599s +3584
[3628919.692566] drbd24: qemu-system-x86[10223] Concurrent local write
detected! [DISCARD L] new: 799606s +512; pending: 799606s +512
[3629004.628073] drbd24: qemu-system-x86[10224] Concurrent local write
detected! [DISCARD L] new: 952311s +3584; pending: 952311s +3584
[3629004.640192] drbd24: qemu-system-x86[10224] Concurrent local write
detected! [DISCARD L] new: 952318s +512; pending: 952318s +512
[3629004.652675] drbd24: qemu-system-x86[10224] Concurrent local write
detected! [DISCARD L] new: 799599s +3584; pending: 799599s +3584
[3629004.664787] drbd24: qemu-system-x86[10224] Concurrent local write
detected! [DISCARD L] new: 799606s +512; pending: 799606s +512

Any ideas what could be causing this?

Google on the "concurrent local write" error only turned up the check-in
that added that code to DRBD.

I can leave the system as it is for a few days, if there is more
information I should collect.

Please CC.

Thanks.
- nolan




More information about the drbd-user mailing list