<div dir="ltr"><div>Hi, <br><br>Let me re-post my old mail to this newgroup because similar errors are produced.<br><br><br>1. Old mail</div><div><br><div>subject: Can dradm-down terminate with 600 seconds timeout?</div><div><br>Dear Lars,<br>Problem:<br> - On Linux, sometimes drbdadm-down-command be hanged as follows;<br> "Command 'drbdsetup down r0' did not terminate within 600 seconds"<br><br>Env:<br> - Linux version: CentOS-7 Linux 3.10.0-229.7.2.el7.x86_64<br> - DRBD Engine version: 9.0.0 (api:2/proto:86-110) GIT-hash: e0e3bc833053a0ed6b4b593b32e40223a9373fb7 Transports (api:8): tcp (1.0.0)<br><br>Test Cluster:<br> - 4 node Linux VMs, mesh network on Vmware<br><br>Test scenario:<br> 1) drbdadm-up 4 nodes all<br> 2) Promote one node to primary<br> 3) Pause all VMs<br> ***Notes: Please use "Pause menu" when stops the node on Vmware. "Suspend " menu some differs from "Pause".<br> 4) Resume all VMs<br> 5) Somtimes, more than one node output 2PC(2 Phase commit) messages periodically.(See attached information)<br> 6) At this point, If we run drbdadm-down-command on these nodes, they will be hanged occasionally with error message as follows;<br> "Command 'drbdsetup down r0' did not terminate within 600 seconds"<br><br>Attached:<br><br>node1:<br><br>Nov 12 12:58:14 drbd9-01 kernel: drbd r0: Preparing cluster-wide state change 3110839547 (0->2 499/145)<br>Nov 12 12:58:14 drbd9-01 kernel: drbd r0: Aborting cluster-wide state change 3110839547 (30000ms) rv = -23<br>Nov 12 12:58:14 drbd9-01 kernel: drbd r0 drbd9-02: Rejecting concurrent remote state change 1556788097 because of state change 1200995267<br>Nov 12 12:58:14 drbd9-01 kernel: drbd r0 drbd9-02: Ignoring P_TWOPC_ABORT packet 1556788097.<br>Nov 12 12:58:44 drbd9-01 kernel: drbd r0: Preparing cluster-wide state change 1200995267 (0->3 499/145)<br>Nov 12 12:58:44 drbd9-01 kernel: drbd r0: Aborting cluster-wide state change 1200995267 (30001ms) rv = -23<br>Nov 12 12:58:44 drbd9-01 kernel: drbd r0 drbd9-02: Rejecting concurrent remote state change 407232822 because of state change 2876480921<br>Nov 12 12:58:44 drbd9-01 kernel: drbd r0 drbd9-02: Ignoring P_TWOPC_ABORT packet 407232822.<br>Nov 12 12:59:14 drbd9-01 kernel: drbd r0: Preparing cluster-wide state change 2876480921 (0->2 499/145)<br>Nov 12 12:59:14 drbd9-01 kernel: drbd r0: Aborting cluster-wide state change 2876480921 (30001ms) rv = -23<br>Nov 12 12:59:44 drbd9-01 kernel: drbd r0: Preparing cluster-wide state change 3839850480 (0->3 499/145)<br>Nov 12 12:59:44 drbd9-01 kernel: drbd r0: Aborting cluster-wide state change 3839850480 (30000ms) rv = -23<br>Nov 12 13:00:14 drbd9-01 kernel: drbd r0: Preparing cluster-wide state change 2477663420 (0->2 499/145)<br>Nov 12 13:00:14 drbd9-01 kernel: drbd r0: Aborting cluster-wide state change 2477663420 (30000ms) rv = -23<br>Nov 12 13:00:14 drbd9-01 kernel: drbd r0 drbd9-02: Rejecting concurrent remote state change 849320515 because of state change 2698322727<br>Nov 12 13:00:14 drbd9-01 kernel: drbd r0 drbd9-02: Ignoring P_TWOPC_ABORT packet 849320515.<br>Nov 12 13:00:44 drbd9-01 kernel: drbd r0: Preparing cluster-wide state change 2698322727 (0->3 499/145)<br>Nov 12 13:00:44 drbd9-01 kernel: drbd r0: Aborting cluster-wide state change 2698322727 (30000ms) rv = -23<br>Nov 12 13:00:44 drbd9-01 kernel: drbd r0 drbd9-02: Rejecting concurrent remote state change 743585927 because of state change 2750548712<br>Nov 12 13:00:44 drbd9-01 kernel: drbd r0 drbd9-02: Ignoring P_TWOPC_ABORT packet 743585927.<br>Nov 12 13:01:14 drbd9-01 kernel: drbd r0: Preparing cluster-wide state change 2750548712 (0->2 499/145)<br>Nov 12 13:01:14 drbd9-01 kernel: drbd r0: Aborting cluster-wide state change 2750548712 (30000ms) rv = -23<br>Nov 12 13:01:14 drbd9-01 kernel: drbd r0 drbd9-02: Rejecting concurrent remote state change 1367793350 because of state change 2283776505<br>Nov 12 13:01:14 drbd9-01 kernel: drbd r0 drbd9-02: Ignoring P_TWOPC_ABORT packet 1367793350.<br>Nov 12 13:01:44 drbd9-01 kernel: drbd r0: Preparing cluster-wide state change 2283776505 (0->3 499/145)<br>Nov 12 13:01:44 drbd9-01 kernel: drbd r0: Aborting cluster-wide state change 2283776505 (30000ms) rv = -23<br>Nov 12 13:01:44 drbd9-01 kernel: drbd r0 drbd9-02: Rejecting concurrent remote state change 3401865695 because of state change 2283776505<br>Nov 12 13:01:44 drbd9-01 kernel: drbd r0 drbd9-02: Ignoring P_TWOPC_ABORT packet 3401865695.<br>Nov 12 13:02:14 drbd9-01 kernel: drbd r0: Preparing cluster-wide state change 52162850 (0->2 499/145)<br>Nov 12 13:02:14 drbd9-01 kernel: drbd r0: Aborting cluster-wide state change 52162850 (30001ms) rv = -23<br>Nov 12 13:02:44 drbd9-01 kernel: drbd r0: Preparing cluster-wide state change 452962539 (0->3 499/145)<br>Nov 12 13:02:44 drbd9-01 kernel: drbd r0: Aborting cluster-wide state change 452962539 (30000ms) rv = -23<br><br><br>node3:<br><br>Nov 12 13:00:14 drbd9-03 kernel: drbd r0 drbd9-01: Ignoring P_TWOPC_ABORT packet 2477663420.<br>Nov 12 13:00:14 drbd9-03 kernel: drbd r0 drbd9-04: Ignoring P_TWOPC_ABORT packet 849320515.<br>Nov 12 13:00:22 drbd9-03 kernel: drbd r0 tcp:drbd9-05: Closing unexpected connection from 100.100.10.21 to port 7000<br>Nov 12 13:00:33 drbd9-03 kernel: drbd r0 tcp:drbd9-05: Closing unexpected connection from 100.100.10.21 to port 7000<br>Nov 12 13:00:40 drbd9-03 kernel: drbd r0 tcp:drbd9-05: Closing unexpected connection from 100.100.10.21 to port 7000<br>Nov 12 13:00:44 drbd9-03 kernel: drbd r0 drbd9-04: Ignoring P_TWOPC_ABORT packet 2698322727.<br>Nov 12 13:00:44 drbd9-03 kernel: drbd r0 drbd9-04: Ignoring P_TWOPC_ABORT packet 743585927.<br>Nov 12 13:00:48 drbd9-03 kernel: drbd_r_r0 D ffff88003d613680 0 36203 2 0x00000084<br>Nov 12 13:00:52 drbd9-03 kernel: drbd r0 tcp:drbd9-05: Closing unexpected connection from 100.100.10.21 to port 7000<br>Nov 12 13:01:03 drbd9-03 kernel: drbd r0 tcp:drbd9-05: Closing unexpected connection from 100.100.10.21 to port 7000<br>Nov 12 13:01:14 drbd9-03 kernel: drbd r0 drbd9-01: Ignoring P_TWOPC_ABORT packet 2750548712.<br>Nov 12 13:01:14 drbd9-03 kernel: drbd r0 drbd9-04: Ignoring P_TWOPC_ABORT packet 1367793350.<br>Nov 12 13:01:15 drbd9-03 kernel: drbd r0 tcp:drbd9-05: Closing unexpected connection from 100.100.10.21 to port 7000<br>Nov 12 13:01:26 drbd9-03 kernel: drbd r0 tcp:drbd9-05: Closing unexpected connection from 100.100.10.21 to port 7000<br>Nov 12 13:01:36 drbd9-03 kernel: drbd r0 tcp:drbd9-05: Closing unexpected connection from 100.100.10.21 to port 7000<br>Nov 12 13:01:44 drbd9-03 kernel: drbd r0 drbd9-04: Ignoring P_TWOPC_ABORT packet 2283776505.<br>Nov 12 13:01:44 drbd9-03 kernel: drbd r0 drbd9-04: Ignoring P_TWOPC_ABORT packet 3401865695.<br>Nov 12 13:01:47 drbd9-03 kernel: drbd r0 tcp:drbd9-05: Closing unexpected connection from 100.100.10.21 to port 7000<br>Nov 12 13:01:56 drbd9-03 kernel: drbd r0 tcp:drbd9-05: Closing unexpected connection from 100.100.10.21 to port 7000<br>Nov 12 13:02:08 drbd9-03 kernel: drbd r0 tcp:drbd9-05: Closing unexpected connection from 100.100.10.21 to port 7000<br>Nov 12 13:02:14 drbd9-03 kernel: drbd r0 drbd9-01: Ignoring P_TWOPC_ABORT packet 52162850.<br>Nov 12 13:02:19 drbd9-03 kernel: drbd r0 tcp:drbd9-05: Closing unexpected connection from 100.100.10.21 to port 7000<br>Nov 12 13:02:29 drbd9-03 kernel: drbd r0 tcp:drbd9-05: Closing unexpected connection from 100.100.10.21 to port 7000<br>Nov 12 13:02:40 drbd9-03 kernel: drbd r0 tcp:drbd9-05: Closing unexpected connection from 100.100.10.21 to port 7000<br>Nov 12 13:02:44 drbd9-03 kernel: drbd r0 drbd9-04: Ignoring P_TWOPC_ABORT packet 452962539.<br>Nov 12 13:02:48 drbd9-03 kernel: drbd_r_r0 D ffff88003d613680 0 36203 2 0x00000084<br>Nov 12 13:02:52 drbd9-03 kernel: drbd r0 tcp:drbd9-05: Closing unexpected connection from 100.100.10.21 to port 7000<br>Nov 12 13:03:03 drbd9-03 kernel: drbd r0 tcp:drbd9-05: Closing unexpected connection from 100.100.10.21 to port 7000<br>Nov 12 13:03:12 drbd9-03 kernel: drbd r0 tcp:drbd9-05: Closing unexpected connection from 100.100.10.21 to port 7000<br>Nov 12 13:03:14 drbd9-03 kernel: drbd r0 drbd9-01: Ignoring P_TWOPC_ABORT packet 4060565678.<br>Nov 12 13:03:14 drbd9-03 kernel: drbd r0 drbd9-04: Ignoring P_TWOPC_ABORT packet 3441716583.<br><br><br><br>2. New question<br><br>Ver:</div><div> - drbd-9.0.1-1</div><div> - CentOS 7</div><div><br><div>Env:</div><div> - configure 4 nodes but use 3 nodes only</div><div><br></div><div>Test:</div><div> 1) setup node1, 2, 3</div><div> 2) node1: primary </div><div> 3) node1: mount /dev/drbd1 /mnt</div><div> 4) node1 : invalidate-remote </div><div> 5) wait for fullsync for all nodes</div><div> 6) test network disconnect/connect for all nodes by disable/enable VM's network icon</div><div> 7) node2: down -> up</div><div> 8) node3: down is pending until 600 sec timeout!!!</div><div><br></div><div>Notes:</div><div> - There is no file copy test.</div><div><br></div><div>Logs:</div><div> - node3 log has important messages as follows;</div><div><br></div><div><div>Feb 26 18:36:28 drbd9-03 kernel: INFO: task drbd_r_r0:3114 blocked for more than 120 seconds.</div><div>Feb 26 18:36:28 drbd9-03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.</div><div>Feb 26 18:36:28 drbd9-03 kernel: drbd_r_r0 D ffff88003d613680 0 3114 2 0x00000080</div><div>Feb 26 18:36:28 drbd9-03 kernel: ffff8800369a3b70 0000000000000046 ffff88001c8ab8e0 ffff8800369a3fd8</div><div>Feb 26 18:36:28 drbd9-03 kernel: ffff8800369a3fd8 ffff8800369a3fd8 ffff88001c8ab8e0 ffff880039f5e960</div><div>Feb 26 18:36:28 drbd9-03 kernel: 7fffffffffffffff ffff88001c8ab8e0 0000000000000002 0000000000000000</div><div>Feb 26 18:36:28 drbd9-03 kernel: Call Trace:</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffff816096a9>] schedule+0x29/0x70</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffff816075f9>] schedule_timeout+0x209/0x2d0</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffff810a9512>] ? default_wake_function+0x12/0x20</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffff8109825b>] ? autoremove_wake_function+0x2b/0x40</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffff810b1c7c>] ? update_curr+0xcc/0x150</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffff81608b72>] __down_common+0xd2/0x14a</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffff810b2a6e>] ? dequeue_task_fair+0x40e/0x620</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffff81608c07>] __down+0x1d/0x1f</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffff8109d201>] down+0x41/0x50</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffffa04b23c4>] state_change_lock+0x54/0x60 [drbd]</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffffa04b7700>] ? __change_peer_disk_states+0x70/0x70 [drbd]</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffffa04b5cc6>] change_cluster_wide_state+0x46/0xe90 [drbd]</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffff8108007e>] ? try_to_del_timer_sync+0x5e/0x90</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffff81080102>] ? del_timer_sync+0x52/0x60</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffff8160756d>] ? schedule_timeout+0x17d/0x2d0</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffff8107ed90>] ? __internal_add_timer+0x130/0x130</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffffa04b7336>] change_cstate+0x86/0xc0 [drbd]</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffffa0486de1>] drbd_receiver+0x151/0x630 [drbd]</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffffa0492521>] drbd_thread_setup+0x61/0x120 [drbd]</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffffa04924c0>] ? w_complete+0x20/0x20 [drbd]</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffff81614158>] ret_from_fork+0x58/0x90</div><div>Feb 26 18:36:28 drbd9-03 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140</div></div><div><br></div><div><br></div></div><div>Let me attach full log files here: <br> - node1: <a href="http://pastebin.com/742jzerd">http://pastebin.com/742jzerd</a></div><div> - node2: <a href="http://pastebin.com/dFaaFkeK">http://pastebin.com/dFaaFkeK</a></div><div> - node3: <a href="http://pastebin.com/irk4ykgN">http://pastebin.com/irk4ykgN</a></div><div><br>Please see my comments interspersed in above log files.<br>I think it's very critical problem.</div><div><br>Thanks.<br><span style="font-size:14px;font-weight:normal"><br></span></div></div>
</div>