Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 6 Jun 2017 7:23 pm, "Andrea del Monaco" < andrea.delmonaco at clustervision.com> wrote: Hello everybody, I am currently facing some issues with the DRBD syncronization. Here is the config file: global { usage-count no; } common { startup { wfc-timeout 15; degr-wfc-timeout 15; outdated-wfc-timeout 15; } disk { resync-rate 80M; disk-flushes no; disk-barrier no; al-extents 3389; c-fill-target 0; c-plan-ahead 18; c-max-rate 200M; } net { protocol C; max-buffers 8000; max-epoch-size 8000; sndbuf-size 1024k; } } resource cmshareddrbdres { net { cram-hmac-alg sha1; shared-secret xxxxxxx; after-sb-0pri discard-younger-primary; after-sb-1pri discard-secondary; csums-alg md5; } on master1 { device /dev/drbd1; disk /dev/sdb; address 10.149.255.254:7789; meta-disk internal; } on master2 { device /dev/drbd1; disk /dev/sdb; address 10.149.255.253:7789; meta-disk internal; } } The network 10.149.0.0/16 is using IPoIB. The messages that i see are (first master): https://pastebin.com/0xCLceeD Suspect messages: [Sun Jun 4 03:50:17 2017] block drbd1: logical block size of local backend does not match (drbd:512, backend:4096); was this a late attach? [Sun Jun 4 03:51:01 2017] drbd cmshareddrbdres: [drbd_w_cmshared/3640] sock_sendmsg time expired, ko = 6 [Sun Jun 4 03:34:12 2017] block drbd1: We did not send a P_BARRIER for 84203ms > ko-count (7) * timeout (60 * 0.1s); drbd kernel thread blocked? (I see so many of these) To me, i would say that there is some issue with the network, but i am not sure, because in that case i would expect drbd to be able to send the messages but going in timeout on the other side. I have tried to stress it and i couldn't reproduce it, so it doesn't seem to be load-related. [root at master1 ~]# uname -r 3.10.0-327.el7.x86_64 [root at master1 ~]# rpm -qa | grep drbd kmod-drbd84-8.4.7-1_1.el7.elrepo.x86_64 drbd84-utils-8.9.5-1.el7.elrepo.x86_64 Any ideas? Regards, -- [image: clustervision_logo.png] Andrea Del Monaco Internal Engineer Mob: +31 64 166 4003 Skype: delmonaco.andrea andrea.delmonaco at clustervision.com ClusterVision BV Gyroscoopweg 56 1042 AC Amsterdam The Netherlands Tel: +31 20 407 7550 <+31%2020%20407%207550> Fax: +31 84 759 8389 <+31%2084%20759%208389> www.clustervision.com _______________________________________________ drbd-user mailing list drbd-user at lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user The ko-count thing from the log means the secondary fails to commit the writes in expected time frame which looks to me like backing device storage/driver/os issues rather than drbd. I would check if that works properly first if I was you. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170608/f5578146/attachment.htm>