Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
yes, this sounds like the cascading failure I had. (I reconfigured the network card on a secondary and the primary crashed, that box (crashed primary) had been secondarying for another machine, and it promptly crashed too!) [my cisco switch has a habit of "isolating" interfaces when they change configs, and may have been a contributing factor, making a 1/4 second reconfig into a 30 second outage.] AFAIK, it is only a problem when you export /dev/drbd* files to xen guests. The simplest fix was to switch to protocol A. The technical details were a bit complicated and I've forgotten some of it, but basically, the data being sent was being free'd by something related to the xen blkback driver (?). Changing to protocol A means that drbd forgets about it sooner and doesn't try to access it after the free(). Lars was suggesting/expect that the 8.3.2 release will have "nosendpage" support, which is effectively what is needed to avoid the problem. Since finding the problem, I've just avoided it by not exporting drbd devices directly to xen. If you really want protocol C, the code change is pretty simple, as there already is a "_drbd_no_send_page()" function, you just want to change the logic so it always goes down that path (currently it's a special case exception). That said, AFAIK, 0.7 should be fine. I have only encountered the issue with the 8.x versions... -Tom On Sat, 9 May 2009, Lars Ellenberg wrote: > On Fri, May 08, 2009 at 05:38:06PM -0400, Victor Hugo dos Santos wrote: >> On Fri, Feb 20, 2009 at 2:37 PM, Victor Hugo dos Santos >> <listas.vhs at gmail.com> wrote: >>> Hello, >>> >>> I have a problem with drbd-0.7.25 and drbd-0.8.2.6... my situation is: >>> >>> two servers Supermicro in company A connected with crossover cable and >>> CentOS 5.2 (all updates installed) >>> two servers Poweredge in company B connected with network fiber in >>> separate sites and Citrix XenServer 4 installed. >>> >>> the problem is that time in time, both servers restart without >>> apparent reason.. in logs, only show messages about network failure >>> and after this, server restart. >>> in company A... this problem occurred 2 o 3 times and the last >>> incident is on 4 months ago.. >>> and I had forget this problem.. because, I think that could be for >>> electrical energy line in this company. >>> but now, in company B.. I have the same problem for first time (after >>> various months work fine) and this servers is connected in UPS line. >>> >>> two servers groups are running a Virtualization Server.. but from >>> different vendors and configurations.. >>> Memory, disks and network work fine in four servers and, DRBD resource >>> contain only data from VMs, none files/data from owner server. >>> >>> and I don't understand why servers restart when recive a error from >>> network !!??? >>> and in case of problem..I think that restart of VMs is probably but >>> not of complete Server. >>> >>> Above, logs and config file of two servers in company B...