[DRBD-user] DRBD 9 Peack CPU load

Sat Jun 4 02:27:46 CEST 2016

Thanks Mats, this was the reason for me to give up on drbd 9 when I was
testing about 2-3 months ago.

On Mon, May 30, 2016 at 9:50 PM, Mats Ramnefors <mats at ramnefors.com> wrote:

> Patch below.
>
> For me it did not work to apply the patch ”as is”. I had to manually edit
> in the changes.
>
> Do not understand why, but I am no expert on patching so it may work for
> you.
>
> With the changes applied, the RA seems to work OK.
>
> /Mats
>
> ++++++ support-drbd9-ra.patch ++++++
> diff --git a/scripts/drbd.ocf b/scripts/drbd.ocf
> index 632e16e..91990fc 100755
> --- a/scripts/drbd.ocf
> +++ b/scripts/drbd.ocf
> @@ -328,6 +328,23 @@ remove_master_score() {
>         do_cmd ${HA_SBIN_DIR}/crm_master -l reboot -D
>  }
>
> +_peer_node_process() {
> +       # _since drbd9 support multiple connections
> +       : ${_peer_node_id:=0}
> +       DRBD_PER_NAME[$_peer_node_id]=$_conn_name
> +       DRBD_PER_ID[$_peer_node_id]=$_peer_node_id
> +       DRBD_PER_CSTATE[$_peer_node_id]=$_cstate
> +       DRBD_PER_ROLE_REMOTE[$_peer_node_id]=${_peer:-Unknown}
> +       DRBD_PER_DSTATE_REMOTE[$_peer_node_id]=${_pdsk:-DUnknown}
> +
> +       : == DEBUG == _peer_node_id                         ==
> ${_peer_node_id} ==
> +       : == DEBUG == DRBD_PER_NAME[_peer_node_id]          ==
> ${DRBD_PER_NAME[${_peer_node_id}]} ==
> +       : == DEBUG == DRBD_PER_ID[_peer_node_id]            ==
> ${DRBD_PER_ID[${_peer_node_id}]} ==
> +       : == DEBUG == DRBD_PER_CSTATE[_peer_node_id]        ==
> ${DRBD_PER_CSTATE[${_peer_node_id}]} ==
> +       : == DEBUG == DRBD_PER_ROLE_REMOTE[_peer_node_id]   ==
> ${DRBD_PER_ROLE_REMOTE[${_peer_node_id}]} ==
> +       : == DEBUG == DRBD_PER_DSTATE_REMOTE[_peer_node_id] ==
> ${DRBD_PER_DSTATE_REMOTE[${_peer_node_id}]} ==
> +}
> +
>  _sh_status_process() {
>         # _volume not present should not happen,
>         # but may help make this agent work even if it talks to drbd 8.3.
> @@ -335,11 +352,36 @@ _sh_status_process() {
>         # not-yet-created volumes are reported as -1
>         (( _volume >= 0 )) || _volume=$[1 << 16]
>         DRBD_ROLE_LOCAL[$_volume]=${_role:-Unconfigured}
> -       DRBD_ROLE_REMOTE[$_volume]=${_peer:-Unknown}
> -       DRBD_CSTATE[$_volume]=$_cstate
>         DRBD_DSTATE_LOCAL[$_volume]=${_disk:-Unconfigured}
> -       DRBD_DSTATE_REMOTE[$_volume]=${_pdsk:-DUnknown}
> +
> +       if $DRBD_VERSION_9 ; then
> +               #Get from _peer_node_process
> +               DRBD_NAME[$_volume]=${DRBD_PER_NAME[@]}
> +               DRBD_ID[$_volume]=${DRBD_PER_ID[@]}
> +               DRBD_VOLUME[$_volume]=${_volume}
> +               DRBD_CSTATE[$_volume]=${DRBD_PER_CSTATE[@]}
> +               DRBD_ROLE_REMOTE[$_volume]=${DRBD_PER_ROLE_REMOTE[@]}
> +               DRBD_DSTATE_REMOTE[$_volume]=${DRBD_PER_DSTATE_REMOTE[@]}
> +
> +               DRBD_PER_NAME=()
> +               DRBD_PER_ID=()
> +               DRBD_PER_CSTATE=()
> +               DRBD_PER_ROLE_REMOTE=()
> +               DRBD_PER_DSTATE_REMOTE=()
> +
> +               : == DEBUG == _volume            == ${_volume} ==
> +               : == DEBUG == DRBD_ROLE_LOCAL    ==
> ${DRBD_ROLE_LOCAL[${_volume}]} ==
> +               : == DEBUG == DRBD_DSTATE_LOCAL  ==
> ${DRBD_DSTATE_LOCAL[${_volume}]} ==
> +               : == DEBUG == DRBD_CSTATE        ==
> ${DRBD_CSTATE[${_volume}]} ==
> +               : == DEBUG == DRBD_ROLE_REMOTE   ==
> ${DRBD_ROLE_REMOTE[${_volume}]} ==
> +               : == DEBUG == DRBD_DSTATE_REMOTE ==
> ${DRBD_DSTATE_REMOTE[${_volume}]} ==
> +       else
> +               DRBD_CSTATE[$_volume]=$_cstate
> +               DRBD_ROLE_REMOTE[$_volume]=${_peer:-Unknown}
> +               DRBD_DSTATE_REMOTE[$_volume]=${_pdsk:-DUnknown}
> +       fi
> }
> +
>  drbd_set_status_variables() {
>         # drbdsetup sh-status prints these values to stdout,
>         # and then prints _sh_status_process.
> @@ -352,6 +394,15 @@ drbd_set_status_variables() {
>         local _resynced_percent
>         local out
>
> +       if $DRBD_VERSION_9 ; then
> +               local _peer_node_id _conn_name
> +               DRBD_PER_NAME=()
> +               DRBD_PER_ID=()
> +               DRBD_PER_CSTATE=()
> +               DRBD_PER_ROLE_REMOTE=()
> +               DRBD_PER_DSTATE_REMOTE=()
> +       fi
> +
>         DRBD_ROLE_LOCAL=()
>         DRBD_ROLE_REMOTE=()
>         DRBD_CSTATE=()
> @@ -369,16 +420,20 @@ drbd_set_status_variables() {
>         # if there was no output at all, or a weird output
>         # make sure the status arrays won't be empty.
>         [[ ${#DRBD_ROLE_LOCAL[@]}    != 0 ]] ||
> DRBD_ROLE_LOCAL=(Unconfigured)
> -       [[ ${#DRBD_ROLE_REMOTE[@]}   != 0 ]] || DRBD_ROLE_REMOTE=(Unknown)
> -       [[ ${#DRBD_CSTATE[@]}        != 0 ]] || DRBD_CSTATE=(Unconfigured)
>         [[ ${#DRBD_DSTATE_LOCAL[@]}  != 0 ]] ||
> DRBD_DSTATE_LOCAL=(Unconfigured)
> +       [[ ${#DRBD_CSTATE[@]}        != 0 ]] || DRBD_CSTATE=(Unconfigured)
> +       [[ ${#DRBD_ROLE_REMOTE[@]}   != 0 ]] || DRBD_ROLE_REMOTE=(Unknown)
>         [[ ${#DRBD_DSTATE_REMOTE[@]} != 0 ]] ||
> DRBD_DSTATE_REMOTE=(DUnknown)
>
> -
> +       if $DRBD_VERSION_9 ; then
> +               : == DEBUG == DRBD_NAME    == ${DRBD_NAME[@]} ==
> +               : == DEBUG == DRBD_ID    == ${DRBD_ID[@]} ==
> +               : == DEBUG == DRBD_VOLUME    == ${DRBD_VOLUME[@]} ==
> +       fi
>         : == DEBUG == DRBD_ROLE_LOCAL    == ${DRBD_ROLE_LOCAL[@]} ==
> -       : == DEBUG == DRBD_ROLE_REMOTE   == ${DRBD_ROLE_REMOTE[@]} ==
> -       : == DEBUG == DRBD_CSTATE        == ${DRBD_CSTATE[@]} ==
>         : == DEBUG == DRBD_DSTATE_LOCAL  == ${DRBD_DSTATE_LOCAL[@]} ==
> +       : == DEBUG == DRBD_CSTATE        == ${DRBD_CSTATE[@]} ==
> +       : == DEBUG == DRBD_ROLE_REMOTE   == ${DRBD_ROLE_REMOTE[@]} ==
>         : == DEBUG == DRBD_DSTATE_REMOTE == ${DRBD_DSTATE_REMOTE[@]} ==
>  }
>
> @@ -414,6 +469,9 @@ maybe_outdate_self()
>         ocf_is_true $OCF_RESKEY_stop_outdates_secondary || return 1
>
>         local host stop_uname
> +       if $DRBD_VERSION_9 ; then
> +               local master temp_nmber outdate_self
> +       fi
>         # We ignore $OCF_RESKEY_CRM_meta_notify_promote_uname here
>         # because: if demote and promote for a _stacked_ resource
>         # (or a "floating" one, where DRBD sits on top of some SAN)
> @@ -437,6 +495,29 @@ maybe_outdate_self()
>                 return 1
>         done
>
> +       if $DRBD_VERSION_9 ; then
> +               temp_name=($DRBD_NAME[@])
> +               temp_dstate=($DRBD_DSTATE_REMOTE[@])
> +               temp_number=${#temp_name[@]}
> +               outdate_self=false
> +
> +               for master in $OCF_RESKEY_CRM_meta_notify_master_uname; do
> +                       for i in `seq 0 $((temp_number-1))`; do
> +                               if [[ ${temp_name[$i]} == "$master" ]] &&
> +                                 [[ ${temp_dstate[$i]} == "DUnknown" ]];
> then
> +                                       outdate_self=true
> +                                       break
> +                               fi
> +                       done
> +                       temp_number=${#temp_name[@]}
> +               done
> +
> +               if ! $outdate_self; then
> +                       #The disconnecting node is not in Primary
> +                       return 1
> +               fi
> +    fi
> +
>         # e.g. post/promote of some other peer.
>         # Should not happen, fencing constraints should take care of that.
>         # But in case it does, scream out loud.
> @@ -993,6 +1074,7 @@ drbd_validate_all () {
>         DRBDADM="drbdadm"
>         DRBDSETUP="drbdsetup"
>         DRBD_HAS_MULTI_VOLUME=false
> +       DRBD_VERSION_9=false
>
>         # these will _exit_ if they don't find the binaries
>         check_binary $DRBDADM
> @@ -1015,18 +1097,23 @@ drbd_validate_all () {
>                         modinfo -F version drbd |
>                         sed -ne
> 's/^\([0-9]\+\)\.\([0-9]\+\)\.\([0-9]\+\).*$/\1 \2 \3/p'))
>         fi
> -       if (( $DRBD_KERNEL_VERSION_CODE >= 0x080400 )); then
> +       if (( $DRBD_KERNEL_VERSION_CODE >= 0x090000 )); then
> +               DRBD_HAS_MULTI_VOLUME=true
> +               DRBD_VERSION_9=true
> +               ocf_log warn "RA for DRBD version 9 is in experiment, do
> not using multiple primaries in DRBD9.0"
> +       elif (( $DRBD_KERNEL_VERSION_CODE >= 0x080400 )); then
>                 DRBD_HAS_MULTI_VOLUME=true
> -       elif (( $DRBD_KERNEL_VERSION_CODE >= 0x090000 )) ; then
> -               ocf_log err "This resource agent does (still) only support
> DRBD version 8.x"
> -               exit $OCF_ERR_INSTALLED
>         fi
>         check_crm_feature_set
>
>         # Check clone and M/S options.
> -       meta_expect clone-max -le 2
> +       # Drbd9 support more than two nodes
> +       if ! $DRBD_VERSION_9 ; then
> +               meta_expect clone-max -le 2
> +       fi
>         meta_expect clone-node-max = 1
>         meta_expect master-node-max = 1
> +       # With current DRBD-9.0 version more than two primaries at the
> same time is not support.
>         meta_expect master-max -le 2
>
>         # Rather than returning $OCF_ERR_CONFIGURED, we sometimes return
> @@ -1080,7 +1167,8 @@ drbd_validate_all () {
>         # DRBD_DEVICES will be a shell array!
>         # FIXME we should double check that we explicitly restrict the set
> of
>         # valid characters in device names...
> -       if DRBD_DEVICES=($($DRBDADM --stacked sh-dev $DRBD_RESOURCE
> 2>/dev/null)); then
> +       # In DRBD9, no matter stacked or not "$DRBDADM --stacked sh-dev
> $DRBD_RESOURCE" will return true
> +       if ! $DRBD_VERSION_9 && DRBD_DEVICES=($($DRBDADM --stacked sh-dev
> $DRBD_RESOURCE 2>/dev/null)); then
>                 # apparently a "stacked" resource. Remember for future
> DRBDADM calls.
>                 DRBDADM="$DRBDADM -S"
>         elif DRBD_DEVICES=($($DRBDADM sh-dev $DRBD_RESOURCE 2>/dev/null));
> then
> diff --git a/user/v9/drbdsetup.c b/user/v9/drbdsetup.c
> index 053b9d3..fba72e1 100644
> --- a/user/v9/drbdsetup.c
> +++ b/user/v9/drbdsetup.c
> @@ -251,6 +251,7 @@ static int del_resource_cmd(struct drbd_cmd *cm, int
> argc, char **argv);
>  static int show_cmd(struct drbd_cmd *cm, int argc, char **argv);
>  static int status_cmd(struct drbd_cmd *cm, int argc, char **argv);
>  static int role_cmd(struct drbd_cmd *cm, int argc, char **argv);
> +static int sh_status_9compat_cmd(struct drbd_cmd *cm, int argc, char
> **argv);
>  static int cstate_cmd(struct drbd_cmd *cm, int argc, char **argv);
>  static int dstate_cmd(struct drbd_cmd *cm, int argc, char **argv);
>  static int check_resize_cmd(struct drbd_cmd *cm, int argc, char **argv);
> @@ -478,6 +479,9 @@ struct drbd_cmd commands[] = {
>         {"role", CTX_RESOURCE, 0, NO_PAYLOAD, role_cmd,
>          .lockless = true,
>          .summary = "Show the current role of a resource." },
> +       {"sh-status", CTX_RESOURCE | CTX_ALL, 0, 0, sh_status_9compat_cmd,
> +        .lockless = true,
> +        .summary = "Show all status of resource." },
>         {"cstate", CTX_PEER_NODE, 0, NO_PAYLOAD, cstate_cmd,
>          .lockless = true,
>          .summary = "Show the current state of a connection." },
> @@ -2576,6 +2580,87 @@ static int role_cmd(struct drbd_cmd *cm, int argc,
> char **argv)
>         return 0;
>  }
>
> +
> +static int sh_status_9compat_cmd(struct drbd_cmd *cm, int argc, char
> **argv)
> +{
> +
> +       struct resources_list *resources_list, *resource;
> +       char *old_objname = objname;
> +
> +       resources_list = sort_resources(list_resources());
> +
> +       for (resource = resources_list; resource; resource =
> resource->next) {
> +               struct devices_list *devices, *device;
> +               struct connections_list *connections, *connection;
> +               struct peer_devices_list *peer_devices = NULL,
> *peer_device;
> +               struct nlattr *nla;
> +
> +               if (strcmp(old_objname, "all") && strcmp(old_objname,
> resource->name))
> +                       continue;
> +
> +               objname = resource->name;
> +               printI("_res_name=%s\n", objname);
> +
> +               nla = nla_find_nested(resource->res_opts,
> __nla_type(T_node_id));
> +               if (nla)
> +                       printI("_node_id=%d\n\n", *(uint32_t
> *)nla_data(nla));
> +               else
> +                       printI("_node_id=\n\n");
> +
> +               devices = list_devices(resource->name);
> +               connections =
> sort_connections(list_connections(resource->name));
> +               if (devices && connections)
> +                       peer_devices = list_peer_devices(resource->name);
> +
> +               link_peer_devices_to_devices(peer_devices, devices);
> +
> +               for (device = devices; device; device = device->next) {
> +                       ++indent;
> +                       printI("_minor=%d\n", device->minor);
> +                       printI("_volume=%d\n", device->ctx.ctx_volume);
> +                       //refer to v84
> +                       //printI("_known=%s\n", xxx);
> +                       printI("_role=%s\n",
> drbd_role_str(resource->info.res_role));
> +                       printI("_disk=%s\n\n",
> drbd_disk_str(device->info.dev_disk_state));
> +
> +                       for (connection = connections; connection;
> connection = connection->next) {
> +                               ++indent;
> +                               for (peer_device = peer_devices;
> peer_device; peer_device = peer_device->next) {
> +                                       if
> (connection->ctx.ctx_peer_node_id != peer_device->ctx.ctx_peer_node_id
> +                                               || device->ctx.ctx_volume
> != peer_device->ctx.ctx_volume)
> +                                               continue;
> +                                       printI("_conn_name=%s\n",
> connection->ctx.ctx_conn_name);
> +                                       printI("_peer_node_id=%d\n",
> connection->ctx.ctx_peer_node_id);
> +                                       printI("_cstate=%s\n",
> drbd_conn_str(connection->info.conn_connection_state));
> +                                       if
> (connection->info.conn_connection_state == C_CONNECTED) {
> +                                               printI("_peer=%s\n",
> drbd_role_str(connection->info.conn_role));
> +                                               printI("_pdsk=%s\n\n",
> drbd_disk_str(peer_device->info.peer_disk_state));
> +                                       } else {
> +                                               printI("_peer=\n");
> +                                               printI("_pdsk=\n");
> +                                       }
> +                                       wrap_printf(0,
> "_peer_node_process\n\n");
> +                               }
> +                               //Dummy
> +                               //printI("_flags_susp==%s\n", xxx);
> +                               //...
> +                               --indent;
> +                       }
> +
> +                       wrap_printf(0, "_sh_status_process\n\n");
> +                       --indent;
> +               }
> +
> +               free_connections(connections);
> +               free_devices(devices);
> +               free_peer_devices(peer_devices);
> +       }
> +
> +       free(resources_list);
> +       objname = old_objname;
> +       return 0;
> +}
> +
>  static int cstate_cmd(struct drbd_cmd *cm, int argc, char **argv)
>  {
>         struct connections_list *connections, *connection;
>
>
>
>
> 30 maj 2016 kl. 13:36 skrev Igor Cicimov <icicimov at gmail.com>:
>
>
>
> On Tue, May 17, 2016 at 8:21 AM, Mats Ramnefors <mats at ramnefors.com>
> wrote:
>
>> I am testing a DRBD 9 and 8.4 in simple 2 node active - passive clusters
>> with NFS.
>>
>> Copying files form a third server to the NFS share using dd, I typically
>> see an average of 20% CPU load (with v9) on the primary during transfer of
>> larger files, testing with 0,5 and 2 GB.
>>
>> At the very end of the transfer DRBD process briefly peaks at 70 - 100%
>> CPU.
>>
>> This causes occasional problems with Corosync believing the node is down.
>> Increasing the token time for Corosync to 2000 ms fixes the symptom but I
>> am wondering about the root cause and any possible fixes?
>>
>> This is the DRBD configuration.
>>
>> resource san_data {
>>   protocol C;
>>   meta-disk internal;
>>   device /dev/drbd1;
>>   disk   /dev/nfs/share;
>>   net {
>>     verify-alg sha1;
>>     cram-hmac-alg sha1;
>>     shared-secret ”****************";
>>     after-sb-0pri discard-zero-changes;
>>     after-sb-1pri discard-secondary;
>>     after-sb-2pri disconnect;
>>   }
>>   on san1 {
>>     address  192.168.1.86:7789;
>>   }
>>   on san2 {
>>     address  192.168.1.87:7789;
>>   }
>> }
>>
>> The nodes are two VM on different ESXi hosts (Dell T620). Hosts are very
>> lightly loaded. Network is 1 Gb at the moment through a Catalyst switch.
>> Network appears not saturated.
>>
>> BTW when can we expect a DRBD resource agent for v9? It took me a while
>> to figure out why DRBD9 was not working with Pacemaker and then finding a
>> patch to the agent :)
>>
>> Cheers Mats
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
> Hi Mats,
>
> Can you please share the patch if you don't mind?
>
> Thanks,
> Igor
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160604/c59770a3/attachment.htm>