Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 25 Feb 2017 3:32 am, "Dr. Volker Jaenisch" <volker.jaenisch at inqbus.de> wrote: Servus ! Am 24.02.2017 um 15:53 schrieb Lars Ellenberg: On Fri, Feb 24, 2017 at 03:08:04PM +0100, Dr. Volker Jaenisch wrote: If both 10Gbit links fail then the bond0 aka the worker connection fails and DRBD goes - as expected - into split brain. But that is not the problem. DRBD will be *disconnected*, yes. Sorry, was not precise in my wording. But I assumed that after going into disconnect state the Cluster manager is informed and reflects this somehow. I now noticed that a CIB rule is set on the former primary to stay primary (please have a look at the cluster state at the end of this email.) but I still wonder why this is not reflected in the crm status. I was misled by this missing status information and concluded wrongly, that the ocf:linbit:drbd plugin does not inform the CRM/CIB. Sorry, for blaiming drbd. But I am still confused about the behavior of pacemaker in not reflecting the change of DRBD in the crm status. Maybe this question should go to the pacemaker list. But no reason for it to be "split brain"ed yet. and with proper fencing configured, it won't. This is our DRBD config. This is all quite basic: resource r0 { disk { fencing resource-only; } This needs to be: fencing resource-and-stonith; if you wish drbd to tell crm to take any action. handlers { fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; } on mail1 { device /dev/drbd1; disk /dev/sda1; address 172.27.250.8:7789; meta-disk internal; } on mail2 { device /dev/drbd1; disk /dev/sda1; address 172.27.250.9:7789; meta-disk internal; } } *What did we miss?* We have no Stonith configured, yet. And IMHO a missing stonith configuration should not interfere with the DRBD-state change. Or am I wrong with this assumption? State after bond0 goes down: root at mail1:/home/volker# crm status Stack: corosync Current DC: mail2 (version 1.1.15-e174ec8) - partition with quorum Last updated: Fri Feb 24 16:56:44 2017 Last change: Fri Feb 24 16:45:19 2017 by root via cibadmin on mail2 2 nodes and 7 resources configured Online: [ mail1 mail2 ] Full list of resources: Master/Slave Set: ms_drbd_mail [drbd_mail] Masters: [ mail2 ] Slaves: [ mail1 ] Resource Group: FS_IP fs_mail (ocf::heartbeat:Filesystem): Started mail2 vip_193.239.30.23 (ocf::heartbeat:IPaddr2): Started mail2 vip_172.27.250.7 (ocf::heartbeat:IPaddr2): Started mail2 Resource Group: Services postgres_pg2 (ocf::heartbeat:pgsql): Started mail2 Dovecot (lsb:dovecot): Started mail2 Failed Actions: * vip_172.27.250.7_monitor_30000 on mail2 'not running' (7): call=55, status=complete, exitreason='none', last-rc-change='Fri Feb 24 16:47:07 2017', queued=0ms, exec=0ms root at mail2:/home/volker# drbd-overview 1:r0/0 StandAlone Primary/Unknown UpToDate/Outdated /shared/data ext4 916G 12G 858G 2% root at mail1:/home/volker# drbd-overview 1:r0/0 WFConnection Secondary/Unknown UpToDate/DUnknown And after bringing up bond0 again the same state on both machines. After cleanup of the failed VIP interface still the same state: root at mail2:/home/volker# crm status Stack: corosync Current DC: mail2 (version 1.1.15-e174ec8) - partition with quorum Last updated: Fri Feb 24 17:01:05 2017 Last change: Fri Feb 24 16:59:32 2017 by hacluster via crmd on mail2 2 nodes and 7 resources configured Online: [ mail1 mail2 ] Full list of resources: Master/Slave Set: ms_drbd_mail [drbd_mail] Masters: [ mail2 ] Slaves: [ mail1 ] Resource Group: FS_IP fs_mail (ocf::heartbeat:Filesystem): Started mail2 vip_193.239.30.23 (ocf::heartbeat:IPaddr2): Started mail2 vip_172.27.250.7 (ocf::heartbeat:IPaddr2): Started mail2 Resource Group: Services postgres_pg2 (ocf::heartbeat:pgsql): Started mail2 Dovecot (lsb:dovecot): Started mail2 root at mail2:/home/volker# drbd-overview 1:r0/0 StandAlone Primary/Unknown UpToDate/Outdated /shared/data ext4 916G 12G 858G 2% root at mail1:/home/volker# drbd-overview 1:r0/0 WFConnection Secondary/Unknown UpToDate/DUnknown After issuing a mail2# drbdadm connect all the nodes resync and everything is in best order (The "sticky" rule is cleared also). Cheers, Volker General Setup : Stock Debian Jessie without any modifications. DRBD, Pacemaker etc. all Debian. Here our crm config: node 740030984: mail1 \ attributes standby=off node 740030985: mail2 \ attributes standby=off primitive Dovecot lsb:dovecot \ op monitor interval=20s timeout=15s \ meta target-role=Started primitive drbd_mail ocf:linbit:drbd \ params drbd_resource=r0 \ op monitor interval=15s role=Master \ op monitor interval=16s role=Slave \ op start interval=0 timeout=240s \ op stop interval=0 timeout=100s ... ms ms_drbd_mail drbd_mail \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true is-managed=true target-role=Started order FS_IP_after_drbd inf: ms_drbd_mail:promote FS_IP:start order dovecot_after_FS_IP inf: FS_IP:start Services:start location drbd-fence-by-handler-r0-ms_drbd_mail ms_drbd_mail \ *rule $role=Master -inf: #uname ne mail2* colocation mail_fs_on_drbd inf: FS_IP Services ms_drbd_mail:Master property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.15-e174ec8 \ cluster-infrastructure=corosync \ cluster-name=mail \ stonith-enabled=false \ last-lrm-refresh=1487951972 \ no-quorum-policy=ignore -- ========================================================= inqbus Scientific Computing Dr. Volker Jaenisch Richard-Strauss-Straße 1 +49(08861) 690 474 0 86956 Schongau-West http://www.inqbus.de ========================================================= _______________________________________________ drbd-user mailing list drbd-user at lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170225/aa29751e/attachment.htm>