[DRBD-user] WFBitMapS: secondary gone, data on the primary inaccessible

Tomasz Chmielewski mangoo at wpkg.org
Thu Jan 15 13:02:19 CET 2009


I was experiencing a weird user-space problem when DRBD nodes were
connecting/disconnecting. So let's try to troubleshoot...

I was connecting/disconnecting the nodes (either by killing the VPN
connection, or by using "drbdadm connect/disconnect" command).

Until:

# drbdadm disconnect some_thing
State change failed: (-2) Refusing to be Primary without at least one UpToDate disk
Command 'drbdsetup /dev/drbd9 disconnect' terminated with exit code 11

Which is not entirely true, as this is the Primary and UpToDate!

 9: cs:WFBitMapS st:Primary/Secondary ds:UpToDate/Inconsistent C r---                      
    ns:0 nr:0 dw:144992692 dr:1680068992 al:603186 bm:601017 lo:0 pe:1 ua:0 ap:1           
        resync: used:0/31 hits:224202 misses:1770 starving:0 dirty:0 changed:1770          
        act_log: used:1/127 hits:35644987 misses:614310 starving:1481 dirty:10251 changed:603186


What's more problematic, data on this primary is _inaccessible_, i.e., when we do:

# fdisk -l /dev/drbd9

We won't get any output, as fdisk (and a handful of other processes) will be in 
uninterruptible sleep, waiting until DRBD changes state from WFBitMapS:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
21626 manager   20   0 11784  664  340 R   60  0.1   1:52.51 bash
  174 root      15  -5     0    0    0 R   39  0.0 102:27.50 kswapd0
 3161 root      20   0     0    0    0 D   20  0.0   2:23.97 drbd9_worker
 3361 root      20   0 4177m 8828  732 R   20  0.9  10:21.97 tgtd
 7956 root      20   0     0    0    0 D   20  0.0   3:25.06 pdflush
21203 root      20   0     0    0    0 R   20  0.0   2:07.11 drbd9_receiver
21617 root      20   0  3908  484  400 R   20  0.0   0:45.01 fdisk


Is it by design, that when a node is a primary and secondary connects to it, we loose
access to data on the primary?


Also, there seems to be a bug somewhere in the code responsible for WFBitMapS: it's in that
state for more than 10 minutes now, although secondary is accessible. Is reboot the only
option now to recover?


-- 
Tomasz Chmielewski
http://wpkg.org



More information about the drbd-user mailing list