[DRBD-user] DRBD Sync stalled at 100%

vijay patel catchvjay at hotmail.com
Sat Sep 28 15:44:05 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Friends,

We are having DRBD 8.3.13 running on RHEL 6.4 for a two node cluster. Yesterday we applied OS patches on these servers and restarted them into new kernel. After restart DRBD sync is getting stalled at 100%. I tried to reboot into old kernel also but same issue. I also tried drbdadm disconnect --force r0 and then connect but still it is stalling at 100%. Below are my config file.

Primary :

cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by dag at Build64R6, 2012-09-04 12:06:10
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
    ns:1303160 nr:0 dw:1303160 dr:5501409 al:614 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:360
        [===================>] sync'ed:100.0% (360/360)K
        finish: 0:53:10 speed: 0 (0) K/sec

Secondary :

cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by dag at Build64R6, 2012-09-04 12:06:10
 0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:58460 dw:3583548 dr:0 al:0 bm:26 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:360
        [===================>] sync'ed:100.0% (360/360)K
        finish: 1:05:06 speed: 0 (0) want: 30 K/sec

drbd.conf :

skip {
  As you can see, you can also comment chunks of text
  with a 'skip[optional nonsense]{ skipped text }' section.
  This comes in handy, if you just want to comment out
  some 'resource <some name> {...}' section:
  just precede it with 'skip'.

  The basic format of option assignment is
  <option name><linear whitespace><value>;

  It should be obvious from the examples below,
  but if you really care to know the details:

  <option name> :=
        valid options in the respective scope
  <value>  := <num>|<string>|<choice>|...
              depending on the set of allowed values
              for the respective option.
  <num>    := [0-9]+, sometimes with an optional suffix of K,M,G
  <string> := (<name>|\"([^\"\\\n]*|\\.)*\")+
  <name>   := [/_.A-Za-z0-9-]+
}

#
# At most ONE global section is allowed.
# It must precede any resource section.
#
global {
    # By default we load the module with a minor-count of 32. In case you
    # have more devices in your config, the module gets loaded with
    # a minor-count that ensures that you have 10 minors spare.
    # In case 10 spare minors are too little for you, you can set the
    # minor-count exeplicit here. ( Note, in contrast to DRBD-0.7 an
    # unused, spare minor has only a very little overhead of allocated
    # memory (a single pointer to be exact). )
    #
    # minor-count 64;

    # The user dialog counts and displays the seconds it waited so
    # far. You might want to disable this if you have the console
    # of your server connected to a serial terminal server with
    # limited logging capacity.
    # The Dialog will print the count each 'dialog-refresh' seconds,
    # set it to 0 to disable redrawing completely. [ default = 1 ]
    #
    # dialog-refresh 5; # 5 seconds

    # You might disable one of drbdadm's sanity check.
    # disable-ip-verification;

    # Participate in DRBD's online usage counter at http://usage.drbd.org
    # possilbe options: ask, yes, no. Default is ask. In case you do not
    # know, set it to ask, and follow the on screen instructions later.
    usage-count no;
}


#
# The common section can have all the sections a resource can have but
# not the host section (started with the "on" keyword).
# The common section must precede all resources.
# All resources inherit the settings from the common section.
# Whereas settings in the resources have precedence over the common
# setting.
#

common {
  syncer { rate 3M; }
}

resource r0 {
        protocol C;
        #incon-degr-cmd "halt -f";
        startup {
                degr-wfc-timeout 120;    # 2 minutes.
        }
        disk {
                on-io-error   detach;
        }
        handlers
        {
            split-brain "/root/splitbrain.sh root";
        }
        net {
        }
        syncer {
                rate 30;
                #group 1;
                al-extents 257;
        }
        on Primary {
                device          /dev/drbd0;
                meta-disk       /dev/sdb1[0];
                disk            /dev/sdb2;
                address         xxx.xxx.xxx.xxx:7788;
        }
        on Secondary {
                device          /dev/drbd0;
                meta-disk       /dev/sdb1[0];
                disk            /dev/sdb2;
                address         xxx.xxx.xxx.xxx:7788;
        }
}


logs :

Sep 28 08:16:30 secondary kernel: block drbd0: peer( Primary -> Unknown ) conn( SyncTarget -> Disconnecting ) pdsk( UpToDate -> DUnknown )
Sep 28 08:16:30 secondary kernel: block drbd0: asender terminated
Sep 28 08:16:30 secondary kernel: block drbd0: Terminating asender thread
Sep 28 08:16:30 secondary kernel: block drbd0: bitmap WRITE of 1599 pages took 34 jiffies
Sep 28 08:16:30 secondary kernel: block drbd0: 360 KB (90 bits) marked out-of-sync by on disk bit-map.
Sep 28 08:16:30 secondary kernel: block drbd0: Connection closed
Sep 28 08:16:30 secondary kernel: block drbd0: conn( Disconnecting -> StandAlone )
Sep 28 08:16:30 secondary kernel: block drbd0: receiver terminated
Sep 28 08:16:30 secondary kernel: block drbd0: Terminating receiver thread
Sep 28 08:16:33 secondary kernel: block drbd0: conn( StandAlone -> Unconnected )
Sep 28 08:16:33 secondary kernel: block drbd0: Starting receiver thread (from drbd0_worker [1765])
Sep 28 08:16:33 secondary kernel: block drbd0: receiver (re)started
Sep 28 08:16:33 secondary kernel: block drbd0: conn( Unconnected -> WFConnection )
Sep 28 08:16:33 secondary kernel: block drbd0: Handshake successful: Agreed network protocol version 96
Sep 28 08:16:33 secondary kernel: block drbd0: conn( WFConnection -> WFReportParams )
Sep 28 08:16:33 secondary kernel: block drbd0: Starting asender thread (from drbd0_receiver [29181])
Sep 28 08:16:33 secondary kernel: block drbd0: data-integrity-alg: <not-used>
Sep 28 08:16:33 secondary kernel: block drbd0: drbd_sync_handshake:
Sep 28 08:16:33 secondary kernel: block drbd0: self 5F0D0794C3189654:0000000000000000:31D1206D1558C3A2:31D0206D1558C3A3 bits:90 flags:0
Sep 28 08:16:33 secondary kernel: block drbd0: peer EF964F9B847F7A89:5F0D0794C3189655:5F0C0794C3189655:5F0B0794C3189655 bits:90 flags:0
Sep 28 08:16:33 secondary kernel: block drbd0: uuid_compare()=-1 by rule 50
Sep 28 08:16:33 secondary kernel: block drbd0: Becoming sync target due to disk states.
Sep 28 08:16:33 secondary kernel: block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Sep 28 08:16:33 secondary kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID )
Sep 28 08:16:33 secondary kernel: block drbd0: updated sync uuid 5F0E0794C3189654:0000000000000000:31D1206D1558C3A2:31D0206D1558C3A3
Sep 28 08:16:33 secondary kernel: block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
Sep 28 08:16:33 secondary kernel: block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
Sep 28 08:16:33 secondary kernel: block drbd0: conn( WFSyncUUID -> SyncTarget )
Sep 28 08:16:33 secondary kernel: block drbd0: Began resync as SyncTarget (will sync 360 KB [90 bits set]).

Appreciate any help.

Thanks,
Vjay
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130928/3d2cdacb/attachment.htm>


More information about the drbd-user mailing list