[DRBD-user] DRBD errors

Thu Jun 8 01:40:08 CEST 2017

On 6 Jun 2017 7:23 pm, "Andrea del Monaco" <
andrea.delmonaco at clustervision.com> wrote:

Hello everybody,

I am currently facing some issues with the DRBD syncronization.
Here is the config file:
global {
        usage-count no;
}

common {
        startup {
                wfc-timeout 15;
                degr-wfc-timeout 15;
                outdated-wfc-timeout 15;
        }
        disk {
                resync-rate 80M;
                disk-flushes no;
                disk-barrier no;
                al-extents 3389;
                c-fill-target 0;
                c-plan-ahead 18;
                c-max-rate 200M;
        }
        net {
                protocol C;
                max-buffers 8000;
                max-epoch-size 8000;
                sndbuf-size 1024k;
        }
}

resource cmshareddrbdres {
        net {
                cram-hmac-alg sha1;
                shared-secret xxxxxxx;
                after-sb-0pri discard-younger-primary;
                after-sb-1pri discard-secondary;
                csums-alg md5;
        }
        on master1 {
                device     /dev/drbd1;
                disk       /dev/sdb;
                address    10.149.255.254:7789;
                meta-disk  internal;
        }
        on master2 {
                device     /dev/drbd1;
                disk       /dev/sdb;
                address    10.149.255.253:7789;
                meta-disk  internal;
        }
}

The network 10.149.0.0/16 is using IPoIB.

The messages that i see are (first master): https://pastebin.com/0xCLceeD

Suspect messages:
[Sun Jun  4 03:50:17 2017] block drbd1: logical block size of local backend
does not match (drbd:512, backend:4096); was this a late attach?
[Sun Jun  4 03:51:01 2017] drbd cmshareddrbdres: [drbd_w_cmshared/3640]
sock_sendmsg time expired, ko = 6
[Sun Jun  4 03:34:12 2017] block drbd1: We did not send a P_BARRIER for
84203ms > ko-count (7) * timeout (60 * 0.1s); drbd kernel thread blocked?
(I see so many of these)

To me, i would say that there is some issue with the network, but i am not
sure, because in that case i would expect drbd to be able to send the
messages but going in timeout on the other side.

I have tried to stress it and i couldn't reproduce it, so it doesn't seem
to be load-related.

[root at master1 ~]# uname -r
3.10.0-327.el7.x86_64
[root at master1 ~]# rpm -qa | grep drbd
kmod-drbd84-8.4.7-1_1.el7.elrepo.x86_64
drbd84-utils-8.9.5-1.el7.elrepo.x86_64

Any ideas?

Regards,
-- 

[image: clustervision_logo.png]
Andrea Del Monaco
Internal Engineer

Mob: +31 64 166 4003
Skype: delmonaco.andrea
andrea.delmonaco at clustervision.com

ClusterVision BV
Gyroscoopweg 56
1042 AC Amsterdam
The Netherlands
Tel: +31 20 407 7550 <+31%2020%20407%207550>
Fax: +31 84 759 8389 <+31%2084%20759%208389>
www.clustervision.com

_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

The ko-count thing from the log means the secondary fails to commit the
writes in expected time frame which looks to me like backing device
storage/driver/os issues rather than drbd. I would check if that works
properly first if I was you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170608/f5578146/attachment.htm>