[DRBD-user] drbd logic bug?

Kelsey Cummings kgc at corp.sonic.net
Fri Oct 12 00:10:47 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Fri, Oct 12, 2007 at 12:00:04AM +0200, Lars Ellenberg wrote:
...
> > Oct 11 13:09:09 a kernel: drbd1: logic bug? rs_left=771449 > rs_total=416104 (rs_failed 0)
> > Oct 11 13:09:09 a kernel: drbd1: logic bug? rs_left=771447 > rs_total=416104 (rs_failed 0)
> > Oct 11 13:09:09 a kernel: drbd1: logic bug? rs_left=771446 > rs_total=416104 (rs_failed 0)
> > Oct 11 13:09:09 a kernel: drbd1: logic bug? rs_left=768609 > rs_total=416104 (rs_failed 0)
> > Oct 11 13:09:11 a kernel: drbd1: logic bug? rs_left=762068 > rs_total=416104 (rs_failed 0)
> 
> now, that is interessting...
> probably indeed a "logic bug", in that we don't get the accounting
> right. I believe thought that drbd does the "right thing" still,
> and only gets the counter wrong for some reason.  we'll have a look.

I left the cron'd reboots in place and it seems to have done the right
thing as you say.  Tests are still running fine and the filesystems 
appear to be intact even though they have swapped to and from the other
node.

However, I'm now seeing the b.node with similar errors and the status
reports that it is stalled.

on b
# cat /proc/drbd 
version: 8.0.6 (api:86/proto:86)
SVN Revision: 3048 build by root at a.store, 2007-09-10 15:25:56
 0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
    ns:4542096 nr:54038340 dw:58580464 dr:2043989 al:1135 bm:746 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:241212 misses:264 starving:0 dirty:0 changed:264
        act_log: used:0/521 hits:1134396 misses:1135 starving:0 dirty:0 changed:1135
 1: cs:SyncSource st:Primary/Secondary ds:UpToDate/Inconsistent C r---
    ns:5878776 nr:58098092 dw:63336632 dr:3669653 al:1307 bm:841 lo:0 pe:352 ua:0 ap:352
        [================>...] sync'ed: 84.7% (544436/3528760)K
        stalled
        resync: used:0/31 hits:221339 misses:273 starving:0 dirty:0 changed:273
        act_log: used:2/521 hits:1308328 misses:1615 starving:0 dirty:308 changed:1307


on a
version: 8.0.6 (api:86/proto:86)
SVN Revision: 3048 build by root at a.store.usenetbinaries.com, 2007-09-10 15:25:56
 0: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
    ns:0 nr:4970524 dw:4970524 dr:0 al:0 bm:126 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:121070 misses:126 starving:0 dirty:0 changed:126
        act_log: used:0/521 hits:0 misses:0 starving:0 dirty:0 changed:0
 1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
    ns:0 nr:6246352 dw:6246352 dr:0 al:0 bm:273 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:221339 misses:273 starving:0 dirty:0 changed:273
        act_log: used:0/521 hits:0 misses:0 starving:0 dirty:0 changed:0

...
drbd1: conn( WFConnection -> WFReportParams ) 
drbd1: Handshake successful: DRBD Network Protocol version 86
drbd1: Peer authenticated using 20 bytes of 'sha1' HMAC
drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) 
drbd1: Writing meta data super block now.
drbd1: aftr_isp( 0 -> 1 ) 
drbd1: conn( WFBitMapS -> PausedSyncS ) pdsk( Outdated -> Inconsistent ) 
drbd1: Began resync as PausedSyncS (will sync 3528760 KB [882190 bits set]).
drbd1: Writing meta data super block now.
drbd1: pdsk( Inconsistent -> Outdated ) peer_isp( 0 -> 1 ) 
drbd1: lost connection while grabbing the req_lock!
drbd1: pdsk( Outdated -> Inconsistent ) peer_isp( 1 -> 0 ) 
drbd1: conn( PausedSyncS -> SyncSource ) aftr_isp( 1 -> 0 ) 
drbd1: Syncer continues.
drbd1: logic bug? rs_left=1015632 > rs_total=882190 (rs_failed 0)
drbd1: logic bug? rs_left=1015602 > rs_total=882190 (rs_failed 0)
drbd1: logic bug? rs_left=1015574 > rs_total=882190 (rs_failed 0)
drbd1: logic bug? rs_left=1015566 > rs_total=882190 (rs_failed 0)
drbd1: logic bug? rs_left=1015557 > rs_total=882190 (rs_failed 0)
...

Let me know if there is any things else I can do to help squash this.




-- 
Kelsey Cummings - kgc at corp.sonic.net      sonic.net, inc.
System Architect                          2260 Apollo Way
707.522.1000                              Santa Rosa, CA 95407



More information about the drbd-user mailing list