Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, Aug 30, 2017 at 01:22:43PM +0800, 杨成伟 wrote: > Hi All, > > I'm following drbd 9.x user guide and has a 3 node(2 control, 1 > satellite) setup. > > However, I'm not sure what I've did on one node and that node can not > restart by `drbdmanage restart`. > > It errors like below > > > ``` > Attempting to startup the server through D-Bus activation... > ERROR:dbus.proxies:Introspect error on :1.96:/interface: dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. > D-Bus connection successful, server is running and reachable > ``` > > > and from the dbus log(journalctl -u dbus -f), I found that drbdmanage failed like below. > > > ``` > Aug 30 13:10:42 mesos-test-dev002-whdx.qiyi.virtual dbus[815]: [system] Activating service name='org.drbd.drbdmanaged' (using servicehelper) > Aug 30 13:10:42 mesos-test-dev002-whdx.qiyi.virtual dbus-daemon[815]: dbus[815]: [system] Activating service name='org.drbd.drbdmanaged' (using servicehelper) > Aug 30 13:10:42 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO DRBDmanage server, version 0.99.9 -- initializing on node 'mesos-test-dev002-whdx.qiyi.virtual' > Aug 30 13:10:42 mesos-test-dev002-whdx.qiyi.virtual dbus[815]: [system] Successfully activated service 'org.drbd.drbdmanaged' > Aug 30 13:10:42 mesos-test-dev002-whdx.qiyi.virtual dbus-daemon[815]: dbus[815]: [system] Successfully activated service 'org.drbd.drbdmanaged' > Aug 30 13:10:42 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO DrbdAdm: Running external command: drbdadm -vvv adjust .drbdctrl > Aug 30 13:10:42 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO DrbdAdm: External command 'drbdadm': Exit code 0 > Aug 30 13:10:42 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO DRBDManage starting as potential leader node > Aug 30 13:10:59 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: ERROR drbdsetup events tracing has failed, restarting > Aug 30 13:10:59 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO drbdsetup events tracing reestablished > Aug 30 13:11:17 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO Reading initial DRBD control volume status > Aug 30 13:11:17 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO Finished reading initial DRBD control volume status > Aug 30 13:11:17 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO Leader election by wait for connections > Aug 30 13:11:17 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO DrbdAdm: Running external command: drbdsetup wait-connect-resource --wait-after-sb=yes --wfc-timeout=2 .drbd > ctrl > Aug 30 13:11:17 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO DrbdAdm: External command 'drbdsetup': Exit code 0 > Aug 30 13:11:17 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO DrbdAdm: Running external command: drbdsetup primary .drbdctrl > Aug 30 13:11:18 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: ERROR DrbdAdm: External command 'drbdsetup': Exit code 11 > Aug 30 13:11:18 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO Leader election: lost election > Aug 30 13:11:24 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: ERROR Signal creation failed, unhandled exception encountered > Aug 30 13:11:24 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: ERROR Signal creation failed, unhandled exception encountered > Aug 30 13:11:24 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: ERROR Signal creation failed, unhandled exception encountered > Aug 30 13:11:24 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO starting resource 'r0' > Aug 30 13:11:24 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO DrbdAdm: Running external command: drbdadm --config-to-test /var/lib/drbd.d/drbdmanage_r0.res.tmp --config-t > o-exclude /var/lib/drbd.d/drbdmanage_r0.res sh-nop > Aug 30 13:11:24 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO DrbdAdm: External command 'drbdadm': Exit code 0 > Aug 30 13:11:24 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO DrbdAdm: Running external command: drbdadm -c /var/lib/drbd.d/drbdmanage_r0.res.tmp -d up r0 > Aug 30 13:11:24 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO DrbdAdm: External command 'drbdadm': Exit code 0 > Aug 30 13:11:24 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO DrbdAdm: Running external command: drbdadm -vvv adjust -c /var/lib/drbd.d/drbdmanage_r0.res r0 > Aug 30 13:11:24 mesos-test-dev002-whdx.qiyi.virtual drbdmanaged[28329]: INFO DrbdAdm: External command 'drbdadm': Exit code 0 > ``` > > > It seems `drbdmanage restart` fails because `drbdsetup primary .drbdctl` fails, > so I just run this command on the node. > > > ``` > [root at mesos-test-dev002-whdx ~]# drbdsetup primary .drbdctrl > .drbdctrl: State change failed: (-1) Multiple primaries not allowed by config > ``` > > > Apparently, it fails and said *Multiple primaries not allowed by config*, > I am not sure why it will run this command because drbdmanage role says this node is not the leader That is okay. On startup it sees that it has a control volume and tries to become primary (it does not check if there is already another primary, there is no reason), it obviously fails, which is the indicator that it will be in the role of a potential leader. > > ``` > [root at mesos-test-dev002-whdx ~]# drbdmanage role > Operation completed successfully > potential leader (CL: mesos-test-dev001-whdx.qiyi.virtual) > ``` Looks good. Especially on startup/restart drbdmanage can take too long until it reacts to commands and then you see these dbus timeouts. When drbdmanage reached some initial state where it can react to dbus commands, it would send "hey, I'm currently not ready"-commands to the CLI and the CLI would then retry and you would not see dbus-timeouts. Getting DM really quick in an initial state where it can answer dbus-commands is not that trivial, so just ignore dbus-timeouts on startup. If the system is not ultra-busy starting hundreds of resources, this is usually one such dbus-timeout. Regards, rck