[DRBD-user] DRBD 9 Peack CPU load

Nick Wang nwang at suse.com
Mon Jun 13 05:09:59 CEST 2016

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


The patch Mats mentioned for drbd9 RA is available on 
https://build.opensuse.org/package/show/network:ha-clustering:Factory/drbd-utils 

The reason why Mats can't patch directly is because he may lack 
some previous custom patches or not using ver8.9.6

I talked this with Lars Ellenberg and got some helpful idea from him.
Since DRBD9 support auto promotion, could use filesystem resource 
instead of drbd RA in pacemaker. However, for some other reason,
like monitoring via pacemaker without filesystem or unify the steps 
to benefit migration from DRBD8 to DRBD9, i implement the patch 
to support DRBD9 RA, hoping it can help in some edge cases.

Best regards,
Nick 

>>> On 2016-6-4 at 8:27, in message
<CAAKASGyyU-invpji3VFMDLSacOww=Oj=jndU_auV2nbg4DwYEg at mail.gmail.com>, Igor
Cicimov <icicimov at gmail.com> wrote:
> Thanks Mats, this was the reason for me to give up on drbd 9 when I was  
> testing about 2-3 months ago.  
>   
> On Mon, May 30, 2016 at 9:50 PM, Mats Ramnefors <mats at ramnefors.com> wrote:  
>   
> > Patch below.  
> >  
> > For me it did not work to apply the patch ”as is”. I had to manually edit  
> > in the changes.  
> >  
> > Do not understand why, but I am no expert on patching so it may work for  
> > you.  
> >  
> > With the changes applied, the RA seems to work OK.  
> >  
> > /Mats  
> >  
> > ++++++ support-drbd9-ra.patch ++++++  
> > diff --git a/scripts/drbd.ocf b/scripts/drbd.ocf  
> > index 632e16e..91990fc 100755  
> > --- a/scripts/drbd.ocf  
> > +++ b/scripts/drbd.ocf  
> > @@ -328,6 +328,23 @@ remove_master_score() {  
> >         do_cmd ${HA_SBIN_DIR}/crm_master -l reboot -D  
> >  }  
> >  
> > +_peer_node_process() {  
> > +       # _since drbd9 support multiple connections  
> > +       : ${_peer_node_id:=0}  
> > +       DRBD_PER_NAME[$_peer_node_id]=$_conn_name  
> > +       DRBD_PER_ID[$_peer_node_id]=$_peer_node_id  
> > +       DRBD_PER_CSTATE[$_peer_node_id]=$_cstate  
> > +       DRBD_PER_ROLE_REMOTE[$_peer_node_id]=${_peer:-Unknown}  
> > +       DRBD_PER_DSTATE_REMOTE[$_peer_node_id]=${_pdsk:-DUnknown}  
> > +  
> > +       : == DEBUG == _peer_node_id                         ==  
> > ${_peer_node_id} ==  
> > +       : == DEBUG == DRBD_PER_NAME[_peer_node_id]          ==  
> > ${DRBD_PER_NAME[${_peer_node_id}]} ==  
> > +       : == DEBUG == DRBD_PER_ID[_peer_node_id]            ==  
> > ${DRBD_PER_ID[${_peer_node_id}]} ==  
> > +       : == DEBUG == DRBD_PER_CSTATE[_peer_node_id]        ==  
> > ${DRBD_PER_CSTATE[${_peer_node_id}]} ==  
> > +       : == DEBUG == DRBD_PER_ROLE_REMOTE[_peer_node_id]   ==  
> > ${DRBD_PER_ROLE_REMOTE[${_peer_node_id}]} ==  
> > +       : == DEBUG == DRBD_PER_DSTATE_REMOTE[_peer_node_id] ==  
> > ${DRBD_PER_DSTATE_REMOTE[${_peer_node_id}]} ==  
> > +}  
> > +  
> >  _sh_status_process() {  
> >         # _volume not present should not happen,  
> >         # but may help make this agent work even if it talks to drbd 8.3.  
> > @@ -335,11 +352,36 @@ _sh_status_process() {  
> >         # not-yet-created volumes are reported as -1  
> >         (( _volume >= 0 )) || _volume=$[1 << 16]  
> >         DRBD_ROLE_LOCAL[$_volume]=${_role:-Unconfigured}  
> > -       DRBD_ROLE_REMOTE[$_volume]=${_peer:-Unknown}  
> > -       DRBD_CSTATE[$_volume]=$_cstate  
> >         DRBD_DSTATE_LOCAL[$_volume]=${_disk:-Unconfigured}  
> > -       DRBD_DSTATE_REMOTE[$_volume]=${_pdsk:-DUnknown}  
> > +  
> > +       if $DRBD_VERSION_9 ; then  
> > +               #Get from _peer_node_process  
> > +               DRBD_NAME[$_volume]=${DRBD_PER_NAME[@]}  
> > +               DRBD_ID[$_volume]=${DRBD_PER_ID[@]}  
> > +               DRBD_VOLUME[$_volume]=${_volume}  
> > +               DRBD_CSTATE[$_volume]=${DRBD_PER_CSTATE[@]}  
> > +               DRBD_ROLE_REMOTE[$_volume]=${DRBD_PER_ROLE_REMOTE[@]}  
> > +               DRBD_DSTATE_REMOTE[$_volume]=${DRBD_PER_DSTATE_REMOTE[@]}  
> > +  
> > +               DRBD_PER_NAME=()  
> > +               DRBD_PER_ID=()  
> > +               DRBD_PER_CSTATE=()  
> > +               DRBD_PER_ROLE_REMOTE=()  
> > +               DRBD_PER_DSTATE_REMOTE=()  
> > +  
> > +               : == DEBUG == _volume            == ${_volume} ==  
> > +               : == DEBUG == DRBD_ROLE_LOCAL    ==  
> > ${DRBD_ROLE_LOCAL[${_volume}]} ==  
> > +               : == DEBUG == DRBD_DSTATE_LOCAL  ==  
> > ${DRBD_DSTATE_LOCAL[${_volume}]} ==  
> > +               : == DEBUG == DRBD_CSTATE        ==  
> > ${DRBD_CSTATE[${_volume}]} ==  
> > +               : == DEBUG == DRBD_ROLE_REMOTE   ==  
> > ${DRBD_ROLE_REMOTE[${_volume}]} ==  
> > +               : == DEBUG == DRBD_DSTATE_REMOTE ==  
> > ${DRBD_DSTATE_REMOTE[${_volume}]} ==  
> > +       else  
> > +               DRBD_CSTATE[$_volume]=$_cstate  
> > +               DRBD_ROLE_REMOTE[$_volume]=${_peer:-Unknown}  
> > +               DRBD_DSTATE_REMOTE[$_volume]=${_pdsk:-DUnknown}  
> > +       fi  
> > }  
> > +  
> >  drbd_set_status_variables() {  
> >         # drbdsetup sh-status prints these values to stdout,  
> >         # and then prints _sh_status_process.  
> > @@ -352,6 +394,15 @@ drbd_set_status_variables() {  
> >         local _resynced_percent  
> >         local out  
> >  
> > +       if $DRBD_VERSION_9 ; then  
> > +               local _peer_node_id _conn_name  
> > +               DRBD_PER_NAME=()  
> > +               DRBD_PER_ID=()  
> > +               DRBD_PER_CSTATE=()  
> > +               DRBD_PER_ROLE_REMOTE=()  
> > +               DRBD_PER_DSTATE_REMOTE=()  
> > +       fi  
> > +  
> >         DRBD_ROLE_LOCAL=()  
> >         DRBD_ROLE_REMOTE=()  
> >         DRBD_CSTATE=()  
> > @@ -369,16 +420,20 @@ drbd_set_status_variables() {  
> >         # if there was no output at all, or a weird output  
> >         # make sure the status arrays won't be empty.  
> >         [[ ${#DRBD_ROLE_LOCAL[@]}    != 0 ]] ||  
> > DRBD_ROLE_LOCAL=(Unconfigured)  
> > -       [[ ${#DRBD_ROLE_REMOTE[@]}   != 0 ]] || DRBD_ROLE_REMOTE=(Unknown)  
> > -       [[ ${#DRBD_CSTATE[@]}        != 0 ]] || DRBD_CSTATE=(Unconfigured)  
> >         [[ ${#DRBD_DSTATE_LOCAL[@]}  != 0 ]] ||  
> > DRBD_DSTATE_LOCAL=(Unconfigured)  
> > +       [[ ${#DRBD_CSTATE[@]}        != 0 ]] || DRBD_CSTATE=(Unconfigured)  
> > +       [[ ${#DRBD_ROLE_REMOTE[@]}   != 0 ]] || DRBD_ROLE_REMOTE=(Unknown)  
> >         [[ ${#DRBD_DSTATE_REMOTE[@]} != 0 ]] ||  
> > DRBD_DSTATE_REMOTE=(DUnknown)  
> >  
> > -  
> > +       if $DRBD_VERSION_9 ; then  
> > +               : == DEBUG == DRBD_NAME    == ${DRBD_NAME[@]} ==  
> > +               : == DEBUG == DRBD_ID    == ${DRBD_ID[@]} ==  
> > +               : == DEBUG == DRBD_VOLUME    == ${DRBD_VOLUME[@]} ==  
> > +       fi  
> >         : == DEBUG == DRBD_ROLE_LOCAL    == ${DRBD_ROLE_LOCAL[@]} ==  
> > -       : == DEBUG == DRBD_ROLE_REMOTE   == ${DRBD_ROLE_REMOTE[@]} ==  
> > -       : == DEBUG == DRBD_CSTATE        == ${DRBD_CSTATE[@]} ==  
> >         : == DEBUG == DRBD_DSTATE_LOCAL  == ${DRBD_DSTATE_LOCAL[@]} ==  
> > +       : == DEBUG == DRBD_CSTATE        == ${DRBD_CSTATE[@]} ==  
> > +       : == DEBUG == DRBD_ROLE_REMOTE   == ${DRBD_ROLE_REMOTE[@]} ==  
> >         : == DEBUG == DRBD_DSTATE_REMOTE == ${DRBD_DSTATE_REMOTE[@]} ==  
> >  }  
> >  
> > @@ -414,6 +469,9 @@ maybe_outdate_self()  
> >         ocf_is_true $OCF_RESKEY_stop_outdates_secondary || return 1  
> >  
> >         local host stop_uname  
> > +       if $DRBD_VERSION_9 ; then  
> > +               local master temp_nmber outdate_self  
> > +       fi  
> >         # We ignore $OCF_RESKEY_CRM_meta_notify_promote_uname here  
> >         # because: if demote and promote for a _stacked_ resource  
> >         # (or a "floating" one, where DRBD sits on top of some SAN)  
> > @@ -437,6 +495,29 @@ maybe_outdate_self()  
> >                 return 1  
> >         done  
> >  
> > +       if $DRBD_VERSION_9 ; then  
> > +               temp_name=($DRBD_NAME[@])  
> > +               temp_dstate=($DRBD_DSTATE_REMOTE[@])  
> > +               temp_number=${#temp_name[@]}  
> > +               outdate_self=false  
> > +  
> > +               for master in $OCF_RESKEY_CRM_meta_notify_master_uname; do  
> > +                       for i in `seq 0 $((temp_number-1))`; do  
> > +                               if [[ ${temp_name[$i]} == "$master" ]] &&  
> > +                                 [[ ${temp_dstate[$i]} == "DUnknown" ]];  
> > then  
> > +                                       outdate_self=true  
> > +                                       break  
> > +                               fi  
> > +                       done  
> > +                       temp_number=${#temp_name[@]}  
> > +               done  
> > +  
> > +               if ! $outdate_self; then  
> > +                       #The disconnecting node is not in Primary  
> > +                       return 1  
> > +               fi  
> > +    fi  
> > +  
> >         # e.g. post/promote of some other peer.  
> >         # Should not happen, fencing constraints should take care of that.  
> >         # But in case it does, scream out loud.  
> > @@ -993,6 +1074,7 @@ drbd_validate_all () {  
> >         DRBDADM="drbdadm"  
> >         DRBDSETUP="drbdsetup"  
> >         DRBD_HAS_MULTI_VOLUME=false  
> > +       DRBD_VERSION_9=false  
> >  
> >         # these will _exit_ if they don't find the binaries  
> >         check_binary $DRBDADM  
> > @@ -1015,18 +1097,23 @@ drbd_validate_all () {  
> >                         modinfo -F version drbd |  
> >                         sed -ne  
> > 's/^\([0-9]\+\)\.\([0-9]\+\)\.\([0-9]\+\).*$/\1 \2 \3/p'))  
> >         fi  
> > -       if (( $DRBD_KERNEL_VERSION_CODE >= 0x080400 )); then  
> > +       if (( $DRBD_KERNEL_VERSION_CODE >= 0x090000 )); then  
> > +               DRBD_HAS_MULTI_VOLUME=true  
> > +               DRBD_VERSION_9=true  
> > +               ocf_log warn "RA for DRBD version 9 is in experiment, do  
> > not using multiple primaries in DRBD9.0"  
> > +       elif (( $DRBD_KERNEL_VERSION_CODE >= 0x080400 )); then  
> >                 DRBD_HAS_MULTI_VOLUME=true  
> > -       elif (( $DRBD_KERNEL_VERSION_CODE >= 0x090000 )) ; then  
> > -               ocf_log err "This resource agent does (still) only support  
> > DRBD version 8.x"  
> > -               exit $OCF_ERR_INSTALLED  
> >         fi  
> >         check_crm_feature_set  
> >  
> >         # Check clone and M/S options.  
> > -       meta_expect clone-max -le 2  
> > +       # Drbd9 support more than two nodes  
> > +       if ! $DRBD_VERSION_9 ; then  
> > +               meta_expect clone-max -le 2  
> > +       fi  
> >         meta_expect clone-node-max = 1  
> >         meta_expect master-node-max = 1  
> > +       # With current DRBD-9.0 version more than two primaries at the  
> > same time is not support.  
> >         meta_expect master-max -le 2  
> >  
> >         # Rather than returning $OCF_ERR_CONFIGURED, we sometimes return  
> > @@ -1080,7 +1167,8 @@ drbd_validate_all () {  
> >         # DRBD_DEVICES will be a shell array!  
> >         # FIXME we should double check that we explicitly restrict the set  
> > of  
> >         # valid characters in device names...  
> > -       if DRBD_DEVICES=($($DRBDADM --stacked sh-dev $DRBD_RESOURCE  
> > 2>/dev/null)); then  
> > +       # In DRBD9, no matter stacked or not "$DRBDADM --stacked sh-dev  
> > $DRBD_RESOURCE" will return true  
> > +       if ! $DRBD_VERSION_9 && DRBD_DEVICES=($($DRBDADM --stacked sh-dev  
> > $DRBD_RESOURCE 2>/dev/null)); then  
> >                 # apparently a "stacked" resource. Remember for future  
> > DRBDADM calls.  
> >                 DRBDADM="$DRBDADM -S"  
> >         elif DRBD_DEVICES=($($DRBDADM sh-dev $DRBD_RESOURCE 2>/dev/null));  
> > then  
> > diff --git a/user/v9/drbdsetup.c b/user/v9/drbdsetup.c  
> > index 053b9d3..fba72e1 100644  
> > --- a/user/v9/drbdsetup.c  
> > +++ b/user/v9/drbdsetup.c  
> > @@ -251,6 +251,7 @@ static int del_resource_cmd(struct drbd_cmd *cm, int  
> > argc, char **argv);  
> >  static int show_cmd(struct drbd_cmd *cm, int argc, char **argv);  
> >  static int status_cmd(struct drbd_cmd *cm, int argc, char **argv);  
> >  static int role_cmd(struct drbd_cmd *cm, int argc, char **argv);  
> > +static int sh_status_9compat_cmd(struct drbd_cmd *cm, int argc, char  
> > **argv);  
> >  static int cstate_cmd(struct drbd_cmd *cm, int argc, char **argv);  
> >  static int dstate_cmd(struct drbd_cmd *cm, int argc, char **argv);  
> >  static int check_resize_cmd(struct drbd_cmd *cm, int argc, char **argv);  
> > @@ -478,6 +479,9 @@ struct drbd_cmd commands[] = {  
> >         {"role", CTX_RESOURCE, 0, NO_PAYLOAD, role_cmd,  
> >          .lockless = true,  
> >          .summary = "Show the current role of a resource." },  
> > +       {"sh-status", CTX_RESOURCE | CTX_ALL, 0, 0, sh_status_9compat_cmd,  
> > +        .lockless = true,  
> > +        .summary = "Show all status of resource." },  
> >         {"cstate", CTX_PEER_NODE, 0, NO_PAYLOAD, cstate_cmd,  
> >          .lockless = true,  
> >          .summary = "Show the current state of a connection." },  
> > @@ -2576,6 +2580,87 @@ static int role_cmd(struct drbd_cmd *cm, int argc,  
> > char **argv)  
> >         return 0;  
> >  }  
> >  
> > +  
> > +static int sh_status_9compat_cmd(struct drbd_cmd *cm, int argc, char  
> > **argv)  
> > +{  
> > +  
> > +       struct resources_list *resources_list, *resource;  
> > +       char *old_objname = objname;  
> > +  
> > +       resources_list = sort_resources(list_resources());  
> > +  
> > +       for (resource = resources_list; resource; resource =  
> > resource->next) {  
> > +               struct devices_list *devices, *device;  
> > +               struct connections_list *connections, *connection;  
> > +               struct peer_devices_list *peer_devices = NULL,  
> > *peer_device;  
> > +               struct nlattr *nla;  
> > +  
> > +               if (strcmp(old_objname, "all") && strcmp(old_objname,  
> > resource->name))  
> > +                       continue;  
> > +  
> > +               objname = resource->name;  
> > +               printI("_res_name=%s\n", objname);  
> > +  
> > +               nla = nla_find_nested(resource->res_opts,  
> > __nla_type(T_node_id));  
> > +               if (nla)  
> > +                       printI("_node_id=%d\n\n", *(uint32_t  
> > *)nla_data(nla));  
> > +               else  
> > +                       printI("_node_id=\n\n");  
> > +  
> > +               devices = list_devices(resource->name);  
> > +               connections =  
> > sort_connections(list_connections(resource->name));  
> > +               if (devices && connections)  
> > +                       peer_devices = list_peer_devices(resource->name);  
> > +  
> > +               link_peer_devices_to_devices(peer_devices, devices);  
> > +  
> > +               for (device = devices; device; device = device->next) {  
> > +                       ++indent;  
> > +                       printI("_minor=%d\n", device->minor);  
> > +                       printI("_volume=%d\n", device->ctx.ctx_volume);  
> > +                       //refer to v84  
> > +                       //printI("_known=%s\n", xxx);  
> > +                       printI("_role=%s\n",  
> > drbd_role_str(resource->info.res_role));  
> > +                       printI("_disk=%s\n\n",  
> > drbd_disk_str(device->info.dev_disk_state));  
> > +  
> > +                       for (connection = connections; connection;  
> > connection = connection->next) {  
> > +                               ++indent;  
> > +                               for (peer_device = peer_devices;  
> > peer_device; peer_device = peer_device->next) {  
> > +                                       if  
> > (connection->ctx.ctx_peer_node_id != peer_device->ctx.ctx_peer_node_id  
> > +                                               || device->ctx.ctx_volume  
> > != peer_device->ctx.ctx_volume)  
> > +                                               continue;  
> > +                                       printI("_conn_name=%s\n",  
> > connection->ctx.ctx_conn_name);  
> > +                                       printI("_peer_node_id=%d\n",  
> > connection->ctx.ctx_peer_node_id);  
> > +                                       printI("_cstate=%s\n",  
> > drbd_conn_str(connection->info.conn_connection_state));  
> > +                                       if  
> > (connection->info.conn_connection_state == C_CONNECTED) {  
> > +                                               printI("_peer=%s\n",  
> > drbd_role_str(connection->info.conn_role));  
> > +                                               printI("_pdsk=%s\n\n",  
> > drbd_disk_str(peer_device->info.peer_disk_state));  
> > +                                       } else {  
> > +                                               printI("_peer=\n");  
> > +                                               printI("_pdsk=\n");  
> > +                                       }  
> > +                                       wrap_printf(0,  
> > "_peer_node_process\n\n");  
> > +                               }  
> > +                               //Dummy  
> > +                               //printI("_flags_susp==%s\n", xxx);  
> > +                               //...  
> > +                               --indent;  
> > +                       }  
> > +  
> > +                       wrap_printf(0, "_sh_status_process\n\n");  
> > +                       --indent;  
> > +               }  
> > +  
> > +               free_connections(connections);  
> > +               free_devices(devices);  
> > +               free_peer_devices(peer_devices);  
> > +       }  
> > +  
> > +       free(resources_list);  
> > +       objname = old_objname;  
> > +       return 0;  
> > +}  
> > +  
> >  static int cstate_cmd(struct drbd_cmd *cm, int argc, char **argv)  
> >  {  
> >         struct connections_list *connections, *connection;  
> >  
> >  
> >  
> >  
> > 30 maj 2016 kl. 13:36 skrev Igor Cicimov <icicimov at gmail.com>:  
> >  
> >  
> >  
> > On Tue, May 17, 2016 at 8:21 AM, Mats Ramnefors <mats at ramnefors.com>  
> > wrote:  
> >  
> >> I am testing a DRBD 9 and 8.4 in simple 2 node active - passive clusters  
> >> with NFS.  
> >>  
> >> Copying files form a third server to the NFS share using dd, I typically  
> >> see an average of 20% CPU load (with v9) on the primary during transfer of  
> >> larger files, testing with 0,5 and 2 GB.  
> >>  
> >> At the very end of the transfer DRBD process briefly peaks at 70 - 100%  
> >> CPU.  
> >>  
> >> This causes occasional problems with Corosync believing the node is down.  
> >> Increasing the token time for Corosync to 2000 ms fixes the symptom but I  
> >> am wondering about the root cause and any possible fixes?  
> >>  
> >> This is the DRBD configuration.  
> >>  
> >> resource san_data {  
> >>   protocol C;  
> >>   meta-disk internal;  
> >>   device /dev/drbd1;  
> >>   disk   /dev/nfs/share;  
> >>   net {  
> >>     verify-alg sha1;  
> >>     cram-hmac-alg sha1;  
> >>     shared-secret ”****************";  
> >>     after-sb-0pri discard-zero-changes;  
> >>     after-sb-1pri discard-secondary;  
> >>     after-sb-2pri disconnect;  
> >>   }  
> >>   on san1 {  
> >>     address  192.168.1.86:7789;  
> >>   }  
> >>   on san2 {  
> >>     address  192.168.1.87:7789;  
> >>   }  
> >> }  
> >>  
> >> The nodes are two VM on different ESXi hosts (Dell T620). Hosts are very  
> >> lightly loaded. Network is 1 Gb at the moment through a Catalyst switch.  
> >> Network appears not saturated.  
> >>  
> >> BTW when can we expect a DRBD resource agent for v9? It took me a while  
> >> to figure out why DRBD9 was not working with Pacemaker and then finding a  
> >> patch to the agent :)  
> >>  
> >> Cheers Mats  
> >> _______________________________________________  
> >> drbd-user mailing list  
> >> drbd-user at lists.linbit.com  
> >> http://lists.linbit.com/mailman/listinfo/drbd-user  
> >  
> >  
> > Hi Mats,  
> >  
> > Can you please share the patch if you don't mind?  
> >  
> > Thanks,  
> > Igor  
> >  
> >  
> >  
>   
 



More information about the drbd-user mailing list