[DRBD-user] Primary fully unavailable with "time expired" errors

AZ 9901 az9901 at gmail.com
Sun Mar 10 16:58:05 CET 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Thanks.
However, DRBD seems to have been stuck for 2 days with these "time expired" messages until I split nodes (it then started again flawlessly).
Seems that it would have stayed in this situation indefinitely, without working.

I already encountered this issue a few months ago, an online verification was also running. 

Anything to do ?
Some tuning in parameters ?
A "retry patch" to code for DRBD to "stop and retry" when it encounters this issue ?
...

Thank you very much !

Best regards,

Ben

Le 10 mars 2013 à 16:33, David Coulson a écrit :

> Sorry - I picked out the wrong line(s).
> 
> Feb 17 20:31:11 srv2-1 kernel: block drbd1: [drbd1_worker/3083] sock_sendmsg time expired, ko = 4294967295 Feb 17 20:31:17 srv2-1 kernel: block drbd1: [drbd1_worker/3083] sock_sendmsg time expired, ko = 4294967294
> That means your network is unreliable. Not much DRBD can do about it - I would investigate the cause of that problem.
> 
> David
> 
> On 3/10/13 11:21 AM, AZ 9901 wrote:
>> David,
>> 
>> Thank you for your answer !
>> 
>> This log entry arrived just after (and is certainly due to the fact that) I closed network communication between srv2-1 and srv2-2 :
>> I connected to secondary server and used iptables to stop communication between the two servers.
>> Just after that, primary server was reachable again !
>> But according to logs, issue started 2 days before.
>> 
>> However, to answer your question, the network between the 2 servers is the private dedicated network OVH uses between its 2 data-centers RBX & SGB :
>> http://www.ovh.co.uk/dedicated_servers/data_centre_selection.xml
>> I have a 100Mbps connection between the 2 servers.
>> 
>> Best regards,
>> 
>> Ben
>> 
>> Le 10 mars 2013 à 16:01, David Coulson a écrit :
>> 
>>> What is your network between the two systems?
>>> 
>>> Feb 19 19:20:56 srv2-2 kernel: block drbd1: PingAck did not arrive in time.
>>> 
>>> That means DRBD couldn't communicate between the nodes.
>>> 
>>> David
>>> 
>>> On 3/10/13 10:59 AM, AZ 9901 wrote:
>>>> Le 5 mars 2013 à 07:21, AZ 9901 a écrit :
>>>> 
>>>>> // I made some errors in my previous mail, here they are corrected
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I faced a big issue with DRBD.
>>>>> 
>>>>> OS : Linux Debian 6
>>>>> Kernel : 2.6.32-46
>>>>> DRBD : 8.3.14
>>>>> 
>>>>> My primary server (srv2-2) was totally unreachable, it only replied to ping.
>>>>> Apache, SSH etc... were not replying anymore.
>>>>> 
>>>>> So I connected to my secondary server (srv2-1) and closed network communication between both.
>>>>> This made srv2-2 available again !
>>>>> I decided however to change srv2-1 from Secondary to Primary and to reboot srv2-2.
>>>>> 
>>>>> Following are logs from srv2-2 and srv2-1, with some comments.
>>>>> srv2-2 : http://pastebin.com/raw.php?i=zkHV5Tr9
>>>>> srv2-1 : http://pastebin.com/raw.php?i=WX4vNR6d
>>>>> 
>>>>> on srv2-2, sar tells me that some of my CPU cores were 100% used (100% iowait) during all the time frame in which I had "time expired" errors. 
>>>>> 
>>>>> Could you help me please ?
>>>>> 
>>>>> Thank you very much,
>>>>> 
>>>>> Ben
>>>>> 
>>>> 
>>>> Hello,
>>>> 
>>>> Any help on this problem ?
>>>> 
>>>> To help further, here is my configuration : http://pastebin.com/raw.php?i=UJ7npfBD
>>>> 
>>>> Thank you very much,
>>>> 
>>>> Best regards,
>>>> 
>>>> Ben
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> drbd-user mailing list
>>>> drbd-user at lists.linbit.com
>>>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130310/7ae2e1ec/attachment.htm>


More information about the drbd-user mailing list