Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Servus ! Am 24.02.2017 um 15:53 schrieb Lars Ellenberg: > On Fri, Feb 24, 2017 at 03:08:04PM +0100, Dr. Volker Jaenisch wrote: >> If both 10Gbit links fail then the bond0 aka the worker connection fails >> and DRBD goes - as expected - into split brain. But that is not the problem. > > DRBD will be *disconnected*, yes. Sorry, was not precise in my wording. But I assumed that after going into disconnect state the Cluster manager is informed and reflects this somehow. I now noticed that a CIB rule is set on the former primary to stay primary (please have a look at the cluster state at the end of this email.) but I still wonder why this is not reflected in the crm status. I was misled by this missing status information and concluded wrongly, that the ocf:linbit:drbd plugin does not inform the CRM/CIB. Sorry, for blaiming drbd. But I am still confused about the behavior of pacemaker in not reflecting the change of DRBD in the crm status. Maybe this question should go to the pacemaker list. > But no reason for it to be "split brain"ed yet. > and with proper fencing configured, it won't. This is our DRBD config. This is all quite basic: resource r0 { disk { fencing resource-only; } handlers { fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; } on mail1 { device /dev/drbd1; disk /dev/sda1; address 172.27.250.8:7789; meta-disk internal; } on mail2 { device /dev/drbd1; disk /dev/sda1; address 172.27.250.9:7789; meta-disk internal; } } *What did we miss?* We have no Stonith configured, yet. And IMHO a missing stonith configuration should not interfere with the DRBD-state change. Or am I wrong with this assumption? State after bond0 goes down: root at mail1:/home/volker# crm status Stack: corosync Current DC: mail2 (version 1.1.15-e174ec8) - partition with quorum Last updated: Fri Feb 24 16:56:44 2017 Last change: Fri Feb 24 16:45:19 2017 by root via cibadmin on mail2 2 nodes and 7 resources configured Online: [ mail1 mail2 ] Full list of resources: Master/Slave Set: ms_drbd_mail [drbd_mail] Masters: [ mail2 ] Slaves: [ mail1 ] Resource Group: FS_IP fs_mail (ocf::heartbeat:Filesystem): Started mail2 vip_193.239.30.23 (ocf::heartbeat:IPaddr2): Started mail2 vip_172.27.250.7 (ocf::heartbeat:IPaddr2): Started mail2 Resource Group: Services postgres_pg2 (ocf::heartbeat:pgsql): Started mail2 Dovecot (lsb:dovecot): Started mail2 Failed Actions: * vip_172.27.250.7_monitor_30000 on mail2 'not running' (7): call=55, status=complete, exitreason='none', last-rc-change='Fri Feb 24 16:47:07 2017', queued=0ms, exec=0ms root at mail2:/home/volker# drbd-overview 1:r0/0 StandAlone Primary/Unknown UpToDate/Outdated /shared/data ext4 916G 12G 858G 2% root at mail1:/home/volker# drbd-overview 1:r0/0 WFConnection Secondary/Unknown UpToDate/DUnknown And after bringing up bond0 again the same state on both machines. After cleanup of the failed VIP interface still the same state: root at mail2:/home/volker# crm status Stack: corosync Current DC: mail2 (version 1.1.15-e174ec8) - partition with quorum Last updated: Fri Feb 24 17:01:05 2017 Last change: Fri Feb 24 16:59:32 2017 by hacluster via crmd on mail2 2 nodes and 7 resources configured Online: [ mail1 mail2 ] Full list of resources: Master/Slave Set: ms_drbd_mail [drbd_mail] Masters: [ mail2 ] Slaves: [ mail1 ] Resource Group: FS_IP fs_mail (ocf::heartbeat:Filesystem): Started mail2 vip_193.239.30.23 (ocf::heartbeat:IPaddr2): Started mail2 vip_172.27.250.7 (ocf::heartbeat:IPaddr2): Started mail2 Resource Group: Services postgres_pg2 (ocf::heartbeat:pgsql): Started mail2 Dovecot (lsb:dovecot): Started mail2 root at mail2:/home/volker# drbd-overview 1:r0/0 StandAlone Primary/Unknown UpToDate/Outdated /shared/data ext4 916G 12G 858G 2% root at mail1:/home/volker# drbd-overview 1:r0/0 WFConnection Secondary/Unknown UpToDate/DUnknown After issuing a mail2# drbdadm connect all the nodes resync and everything is in best order (The "sticky" rule is cleared also). Cheers, Volker General Setup : Stock Debian Jessie without any modifications. DRBD, Pacemaker etc. all Debian. Here our crm config: node 740030984: mail1 \ attributes standby=off node 740030985: mail2 \ attributes standby=off primitive Dovecot lsb:dovecot \ op monitor interval=20s timeout=15s \ meta target-role=Started primitive drbd_mail ocf:linbit:drbd \ params drbd_resource=r0 \ op monitor interval=15s role=Master \ op monitor interval=16s role=Slave \ op start interval=0 timeout=240s \ op stop interval=0 timeout=100s ... ms ms_drbd_mail drbd_mail \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true is-managed=true target-role=Started order FS_IP_after_drbd inf: ms_drbd_mail:promote FS_IP:start order dovecot_after_FS_IP inf: FS_IP:start Services:start location drbd-fence-by-handler-r0-ms_drbd_mail ms_drbd_mail \ *rule $role=Master -inf: #uname ne mail2* colocation mail_fs_on_drbd inf: FS_IP Services ms_drbd_mail:Master property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.15-e174ec8 \ cluster-infrastructure=corosync \ cluster-name=mail \ stonith-enabled=false \ last-lrm-refresh=1487951972 \ no-quorum-policy=ignore -- ========================================================= inqbus Scientific Computing Dr. Volker Jaenisch Richard-Strauss-Straße 1 +49(08861) 690 474 0 86956 Schongau-West http://www.inqbus.de ========================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170224/26adc507/attachment.htm>