Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi all,
following the example in this Florian's post:
(http://fghaas.wordpress.com/2007/10/01/an-underrated-cluster-admins-companion-dopd/)
I'm testing the outdate-peer plugin.
My scenario: two debian machines (OV-HA1 primary, OV-HA2 secondary) ,
heartbeat+drbd, 1 ethernet + 1 serial cable (the ethernet is used both
for drbd replication and to expose services).
I also know that a dedicated ethernet connections between the two nodes
is recommended for drdb data synchronization, but for testing use this
is the scenario :).
Heartbeat is configured with ipfail, so when the ethernet connection
goes down, heartbeat migrate the services to the other node.
Obviusly in this configuration the troubles appears when I unplug the
OV-HA1 (primary) link: I'm testing the outdate-peer daemon as I read on
your post because without this plugin the secondary becames primary (and
this is OK) , but when I reconnect the ethernet the 2 nodes are
"standalone" and not re-syncronize their drbd partitions (this is the
case of "drbd split brain").
Now with your post's configuration:
* in OV-HA2's ha-log I see this warning /WARN: check_drbd_peer:
drbd peer OV-HA1 was not found;/
* however the plugin seems to work, because my OV-HA2 is now outdated;
* after the log message above, I see in OV-HA2's ha-log:
/ResourceManager[6217]: 2007/10/15_14:54:47 ERROR: Return code 20
from /etc/ha.d/resource.d/drbddisk
ResourceManager[6217]: 2007/10/15_14:54:47 CRIT: Giving up
resources due to failure of drbddisk::ovHA/
* investigating the syslog I see that OV-HA2 fails to become
primary
/Oct 15 14:54:47 localhost kernel: drbd0: State change failed:
Refusing to be Primary without at least one UpToDate disk
Oct 15 14:54:47 localhost kernel: drbd0: state = {
cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown r--- }
Oct 15 14:54:47 localhost kernel: drbd0: wanted = {
cs:WFConnection st:Primary/Unknown ds:Outdated/DUnknown r--- }
Oct 15 14:54:47 localhost kernel: ttyS0: 1 input overrun(s)
Oct 15 14:54:47 localhost ResourceManager[6217]: debug:
/etc/ha.d/resource.d/drbddisk ovHA start done. RC=20
Oct 15 14:54:47 localhost ResourceManager[6217]: ERROR: Return
code 20 from /etc/ha.d/resource.d/drbddisk
Oct 15 14:54:47 localhost ResourceManager[6217]: CRIT: Giving up
resources due to failure of drbddisk::ovHA/
It is correct that now in my scenario:
* the plugin outdate the secondary when etherner fails;
* the secondary fails to become primary because now it is marked
as "outdated" :)
Is there a solution?
Best regards,
Matteo.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20071015/5314f58c/attachment.htm>