Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I was experiencing a weird user-space problem when DRBD nodes were connecting/disconnecting. So let's try to troubleshoot... I was connecting/disconnecting the nodes (either by killing the VPN connection, or by using "drbdadm connect/disconnect" command). Until: # drbdadm disconnect some_thing State change failed: (-2) Refusing to be Primary without at least one UpToDate disk Command 'drbdsetup /dev/drbd9 disconnect' terminated with exit code 11 Which is not entirely true, as this is the Primary and UpToDate! 9: cs:WFBitMapS st:Primary/Secondary ds:UpToDate/Inconsistent C r--- ns:0 nr:0 dw:144992692 dr:1680068992 al:603186 bm:601017 lo:0 pe:1 ua:0 ap:1 resync: used:0/31 hits:224202 misses:1770 starving:0 dirty:0 changed:1770 act_log: used:1/127 hits:35644987 misses:614310 starving:1481 dirty:10251 changed:603186 What's more problematic, data on this primary is _inaccessible_, i.e., when we do: # fdisk -l /dev/drbd9 We won't get any output, as fdisk (and a handful of other processes) will be in uninterruptible sleep, waiting until DRBD changes state from WFBitMapS: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21626 manager 20 0 11784 664 340 R 60 0.1 1:52.51 bash 174 root 15 -5 0 0 0 R 39 0.0 102:27.50 kswapd0 3161 root 20 0 0 0 0 D 20 0.0 2:23.97 drbd9_worker 3361 root 20 0 4177m 8828 732 R 20 0.9 10:21.97 tgtd 7956 root 20 0 0 0 0 D 20 0.0 3:25.06 pdflush 21203 root 20 0 0 0 0 R 20 0.0 2:07.11 drbd9_receiver 21617 root 20 0 3908 484 400 R 20 0.0 0:45.01 fdisk Is it by design, that when a node is a primary and secondary connects to it, we loose access to data on the primary? Also, there seems to be a bug somewhere in the code responsible for WFBitMapS: it's in that state for more than 10 minutes now, although secondary is accessible. Is reboot the only option now to recover? -- Tomasz Chmielewski http://wpkg.org