Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, in the following setup * node A Kernel 2.6.24-21-xen, DRBD 8.3.1 * node B Kernel 22.214.171.124-xen, DRBD 8.3.1 this is my scenario: The peers are interconnected via WAN and share 7 DRBDs. Usually, the ones on node B run in StandAlone, always Secondary for disaster recovery purposes. Node A is alwas WFConnection. For a sync operation today, drbdadm connect was successfully called for each resource on B. The syncing started well enough, but then I wanted to interrupt it and issued a "drbdadm disconnect all" on B. This did not succeed, but threw an error about drbdsetup not finishing in time, I believe. After that, no more interaction with DRBD was possible, and one of the resources logged a variation on drbd11: [drbd11_worker/26923] sock_sendmsg time expired, ko = 4294966778 once every 6 seconds. That resource was still reported to be SyncTarget, but transfer was stalled. Most of other resources finished the Resync successfully. The following tasks were being reported as hung: INFO: task drbd15_receiver:27897 blocked for more than 120 seconds. INFO: task drbd14_worker:2670 blocked for more than 120 seconds. INFO: task cqueue:2586 blocked for more than 120 seconds. INFO: task drbd16_worker:2629 blocked for more than 120 seconds. On A, drbd11 reported no errors, but drbd15 did (like the one noted above). Nevertheless, drbd15 finished the resync and stopped complaining. (The errors in the log stop 40 seconds short of the resync finish.) Still, some kernel threads (pdflush?) remained in D state, and Xen on A went unresponsive (the DRBDs are phy disks for Xen guests). As such, I had no means to bring the DRBD devices down and finally just rebooted A. It was at this point that an rmmod -f drbd on B finished, which before had been blocked in D state up to then (after that I realized that blocking DRBD traffic might have been a less destructive solution?) I noticed a fixed deadlock in 3.8.5 and was wondering wether this could be it? Thanks in advance, Felix