Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Thanks. However, DRBD seems to have been stuck for 2 days with these "time expired" messages until I split nodes (it then started again flawlessly). Seems that it would have stayed in this situation indefinitely, without working. I already encountered this issue a few months ago, an online verification was also running. Anything to do ? Some tuning in parameters ? A "retry patch" to code for DRBD to "stop and retry" when it encounters this issue ? ... Thank you very much ! Best regards, Ben Le 10 mars 2013 à 16:33, David Coulson a écrit : > Sorry - I picked out the wrong line(s). > > Feb 17 20:31:11 srv2-1 kernel: block drbd1: [drbd1_worker/3083] sock_sendmsg time expired, ko = 4294967295 Feb 17 20:31:17 srv2-1 kernel: block drbd1: [drbd1_worker/3083] sock_sendmsg time expired, ko = 4294967294 > That means your network is unreliable. Not much DRBD can do about it - I would investigate the cause of that problem. > > David > > On 3/10/13 11:21 AM, AZ 9901 wrote: >> David, >> >> Thank you for your answer ! >> >> This log entry arrived just after (and is certainly due to the fact that) I closed network communication between srv2-1 and srv2-2 : >> I connected to secondary server and used iptables to stop communication between the two servers. >> Just after that, primary server was reachable again ! >> But according to logs, issue started 2 days before. >> >> However, to answer your question, the network between the 2 servers is the private dedicated network OVH uses between its 2 data-centers RBX & SGB : >> http://www.ovh.co.uk/dedicated_servers/data_centre_selection.xml >> I have a 100Mbps connection between the 2 servers. >> >> Best regards, >> >> Ben >> >> Le 10 mars 2013 à 16:01, David Coulson a écrit : >> >>> What is your network between the two systems? >>> >>> Feb 19 19:20:56 srv2-2 kernel: block drbd1: PingAck did not arrive in time. >>> >>> That means DRBD couldn't communicate between the nodes. >>> >>> David >>> >>> On 3/10/13 10:59 AM, AZ 9901 wrote: >>>> Le 5 mars 2013 à 07:21, AZ 9901 a écrit : >>>> >>>>> // I made some errors in my previous mail, here they are corrected >>>>> >>>>> Hello, >>>>> >>>>> I faced a big issue with DRBD. >>>>> >>>>> OS : Linux Debian 6 >>>>> Kernel : 2.6.32-46 >>>>> DRBD : 8.3.14 >>>>> >>>>> My primary server (srv2-2) was totally unreachable, it only replied to ping. >>>>> Apache, SSH etc... were not replying anymore. >>>>> >>>>> So I connected to my secondary server (srv2-1) and closed network communication between both. >>>>> This made srv2-2 available again ! >>>>> I decided however to change srv2-1 from Secondary to Primary and to reboot srv2-2. >>>>> >>>>> Following are logs from srv2-2 and srv2-1, with some comments. >>>>> srv2-2 : http://pastebin.com/raw.php?i=zkHV5Tr9 >>>>> srv2-1 : http://pastebin.com/raw.php?i=WX4vNR6d >>>>> >>>>> on srv2-2, sar tells me that some of my CPU cores were 100% used (100% iowait) during all the time frame in which I had "time expired" errors. >>>>> >>>>> Could you help me please ? >>>>> >>>>> Thank you very much, >>>>> >>>>> Ben >>>>> >>>> >>>> Hello, >>>> >>>> Any help on this problem ? >>>> >>>> To help further, here is my configuration : http://pastebin.com/raw.php?i=UJ7npfBD >>>> >>>> Thank you very much, >>>> >>>> Best regards, >>>> >>>> Ben >>>> >>>> >>>> >>>> _______________________________________________ >>>> drbd-user mailing list >>>> drbd-user at lists.linbit.com >>>> http://lists.linbit.com/mailman/listinfo/drbd-user >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130310/7ae2e1ec/attachment.htm>