[DRBD-user] Primary fully unavailable with "time expired" errors

David Coulson david at davidcoulson.net
Sun Mar 10 16:33:08 CET 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Sorry - I picked out the wrong line(s).

Feb 17 20:31:11 srv2-1 kernel: block drbd1: [drbd1_worker/3083] 
sock_sendmsg time expired, ko = 4294967295 Feb 17 20:31:17 srv2-1 
kernel: block drbd1: [drbd1_worker/3083] sock_sendmsg time expired, ko = 
4294967294

That means your network is unreliable. Not much DRBD can do about it - I would investigate the cause of that problem.

David


On 3/10/13 11:21 AM, AZ 9901 wrote:
> David,
>
> Thank you for your answer !
>
> This log entry arrived just after (and is certainly due to the fact 
> that) I closed network communication between srv2-1 and srv2-2 :
> I connected to secondary server and used iptables to stop 
> communication between the two servers.
> Just after that, primary server was reachable again !
> But according to logs, issue started 2 days before.
>
> However, to answer your question, the network between the 2 servers is 
> the private dedicated network OVH uses between its 2 data-centers RBX 
> & SGB :
> http://www.ovh.co.uk/dedicated_servers/data_centre_selection.xml
> I have a 100Mbps connection between the 2 servers.
>
> Best regards,
>
> Ben
>
> Le 10 mars 2013 à 16:01, David Coulson a écrit :
>
>> What is your network between the two systems?
>>
>> Feb 19 19:20:56 srv2-2 kernel: block drbd1: PingAck did not arrive in time.
>>
>> That means DRBD couldn't communicate between the nodes.
>>
>> David
>>
>> On 3/10/13 10:59 AM, AZ 9901 wrote:
>>> Le 5 mars 2013 à 07:21, AZ 9901 a écrit :
>>>
>>>> // I made some errors in my previous mail, here they are corrected
>>>>
>>>> Hello,
>>>>
>>>> I faced a big issue with DRBD.
>>>>
>>>> OS : Linux Debian 6
>>>> Kernel : 2.6.32-46
>>>> DRBD : 8.3.14
>>>>
>>>> My primary server (srv2-2) was totally unreachable, it only replied 
>>>> to ping.
>>>> Apache, SSH etc... were not replying anymore.
>>>>
>>>> So I connected to my secondary server (srv2-1) and closed network 
>>>> communication between both.
>>>> This made srv2-2 available again !
>>>> I decided however to change srv2-1 from Secondary to Primary and to 
>>>> reboot srv2-2.
>>>>
>>>> Following are logs from srv2-2 and srv2-1, with some comments.
>>>> srv2-2 : http://pastebin.com/raw.php?i=zkHV5Tr9
>>>> srv2-1 : http://pastebin.com/raw.php?i=WX4vNR6d
>>>>
>>>> on srv2-2, sar tells me that some of my CPU cores were 100% used 
>>>> (100% iowait) during all the time frame in which I had "time 
>>>> expired" errors.
>>>>
>>>> Could you help me please ?
>>>>
>>>> Thank you very much,
>>>>
>>>> Ben
>>>>
>>>
>>> Hello,
>>>
>>> Any help on this problem ?
>>>
>>> To help further, here is my configuration : 
>>> http://pastebin.com/raw.php?i=UJ7npfBD
>>>
>>> Thank you very much,
>>>
>>> Best regards,
>>>
>>> Ben
>>>
>>>
>>>
>>> _______________________________________________
>>> drbd-user mailing list
>>> drbd-user at lists.linbit.com
>>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130310/b28675d2/attachment.htm>


More information about the drbd-user mailing list