Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi all, following the example in this Florian's post: (http://fghaas.wordpress.com/2007/10/01/an-underrated-cluster-admins-companion-dopd/) I'm testing the outdate-peer plugin. My scenario: two debian machines (OV-HA1 primary, OV-HA2 secondary) , heartbeat+drbd, 1 ethernet + 1 serial cable (the ethernet is used both for drbd replication and to expose services). I also know that a dedicated ethernet connections between the two nodes is recommended for drdb data synchronization, but for testing use this is the scenario :). Heartbeat is configured with ipfail, so when the ethernet connection goes down, heartbeat migrate the services to the other node. Obviusly in this configuration the troubles appears when I unplug the OV-HA1 (primary) link: I'm testing the outdate-peer daemon as I read on your post because without this plugin the secondary becames primary (and this is OK) , but when I reconnect the ethernet the 2 nodes are "standalone" and not re-syncronize their drbd partitions (this is the case of "drbd split brain"). Now with your post's configuration: * in OV-HA2's ha-log I see this warning /WARN: check_drbd_peer: drbd peer OV-HA1 was not found;/ * however the plugin seems to work, because my OV-HA2 is now outdated; * after the log message above, I see in OV-HA2's ha-log: /ResourceManager[6217]: 2007/10/15_14:54:47 ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk ResourceManager[6217]: 2007/10/15_14:54:47 CRIT: Giving up resources due to failure of drbddisk::ovHA/ * investigating the syslog I see that OV-HA2 fails to become primary /Oct 15 14:54:47 localhost kernel: drbd0: State change failed: Refusing to be Primary without at least one UpToDate disk Oct 15 14:54:47 localhost kernel: drbd0: state = { cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown r--- } Oct 15 14:54:47 localhost kernel: drbd0: wanted = { cs:WFConnection st:Primary/Unknown ds:Outdated/DUnknown r--- } Oct 15 14:54:47 localhost kernel: ttyS0: 1 input overrun(s) Oct 15 14:54:47 localhost ResourceManager[6217]: debug: /etc/ha.d/resource.d/drbddisk ovHA start done. RC=20 Oct 15 14:54:47 localhost ResourceManager[6217]: ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk Oct 15 14:54:47 localhost ResourceManager[6217]: CRIT: Giving up resources due to failure of drbddisk::ovHA/ It is correct that now in my scenario: * the plugin outdate the secondary when etherner fails; * the secondary fails to become primary because now it is marked as "outdated" :) Is there a solution? Best regards, Matteo. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20071015/5314f58c/attachment.htm>