Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Sorry if this appears twice - I sent it the first time from the wrong e-mail address. I have a kernel hang problem with drbd 0.7.13. Unfortunately it is a hard hang, so I have no kernel traces. The problem occurs when I have one system up as the primary under 0.7.13, and I then bring up a second system running 0.7.13 as a secondary and it starts to sync. When the secondary sync starts up the primary hangs. Under 0.7.11 the problem does not happen. Here are the kernel messages from the primary right before the hang: Sep 20 07:55:42 dew2 kernel: drbd0: drbd0_receiver [3762]: cstate WFConnection --> WFReportParams Sep 20 07:55:42 dew2 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74 Sep 20 07:55:42 dew2 kernel: drbd0: Connection established. Sep 20 07:55:42 dew2 kernel: drbd0: I am(P): 1:00000003:00000001:00000006:00000002:10 Sep 20 07:55:42 dew2 kernel: drbd1: drbd1_receiver [3770]: cstate WFConnection --> WFReportParams Sep 20 07:55:42 dew2 kernel: drbd1: Handshake successful: DRBD Network Protocol version 74 Sep 20 07:55:42 dew2 kernel: drbd1: Connection established. Sep 20 07:55:42 dew2 kernel: drbd1: I am(P): 1:00000003:00000001:00000006:00000002:10 Sep 20 07:55:42 dew2 kernel: drbd1: Peer(S): 1:00000003:00000001:00000006:00000001:11 Sep 20 07:55:42 dew2 kernel: drbd1: drbd1_receiver [3770]: cstate WFReportParams --> WFBitMapS Sep 20 07:55:42 dew2 kernel: drbd0: Peer(S): 1:00000003:00000001:00000006:00000001:11 Sep 20 07:55:42 dew2 kernel: drbd0: drbd0_receiver [3762]: cstate WFReportParams --> WFBitMapSSep 20 07:55:42 dew2 kernel: drbd0: Primary/Unknown --> Primary/Secondary Sep 20 07:55:42 dew2 kernel: drbd0: drbd0_receiver [3762]: cstate WFBitMapS --> SyncSource Sep 20 07:55:42 dew2 kernel: drbd0: Resync started as SyncSource (need to sync 1183752 KB [295938 bits set]). Sep 20 07:55:42 dew2 kernel: drbd1: Primary/Unknown --> Primary/Secondary Sep 20 07:55:42 dew2 kernel: drbd1: drbd1_receiver [3770]: cstate WFBitMapS --> SyncSource Sep 20 07:55:42 dew2 kernel: drbd1: Resync started as SyncSource (need to sync 1192960 KB [298240 bits set]). No other messages show up on the primary - it hangs here. Here is what the secondary shows: Sep 20 07:55:30 dew1 kernel: drbd: initialised. Version: 0.7.13 (api:77/proto:74) Sep 20 07:55:30 dew1 kernel: drbd: SVN Revision: 1961 build by root at dew1, 2005-09-18 19:52:12 Sep 20 07:55:38 dew1 kernel: drbd0: resync bitmap: bits=15030177 words=469694 Sep 20 07:55:38 dew1 kernel: drbd0: size = 57 GB (60120708 KB) Sep 20 07:55:38 dew1 kernel: drbd0: 0 KB marked out-of-sync by on disk bit-map. Sep 20 07:55:38 dew1 kernel: drbd0: Found 6 transactions (324 active extents) in activity log. Sep 20 07:55:38 dew1 kernel: drbd0: Marked additional 128 MB as out-of-sync based on AL.Sep 20 07:55:40 dew1 kernel: drbd0: drbdsetup [3615]: cstate Unconfigured --> StandAlone Sep 20 07:55:40 dew1 kernel: drbd1: resync bitmap: bits=43023440 words=1344484 Sep 20 07:55:40 dew1 kernel: drbd1: size = 164 GB (172093760 KB) Sep 20 07:55:41 dew1 kernel: drbd1: 0 KB marked out-of-sync by on disk bit-map. Sep 20 07:55:41 dew1 kernel: drbd1: Found 6 transactions (324 active extents) in activity log. Sep 20 07:55:41 dew1 kernel: drbd1: Marked additional 128 MB as out-of-sync based on AL. Sep 20 07:55:42 dew1 kernel: drbd1: drbdsetup [3630]: cstate Unconfigured --> StandAlone Sep 20 07:55:42 dew1 kernel: drbd0: drbdsetup [3648]: cstate StandAlone --> Unconnected Sep 20 07:55:42 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate Unconnected --> WFConnection Sep 20 07:55:42 dew1 kernel: drbd1: drbdsetup [3656]: cstate StandAlone --> Unconnected Sep 20 07:55:42 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate Unconnected --> WFConnection Sep 20 07:55:42 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate WFConnection --> WFReportParams Sep 20 07:55:42 dew1 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74 Sep 20 07:55:42 dew1 kernel: drbd0: Connection established. Sep 20 07:55:42 dew1 kernel: drbd0: I am(S): 1:00000003:00000001:00000006:00000001:11 Sep 20 07:55:42 dew1 kernel: drbd0: Peer(P): 1:00000003:00000001:00000006:00000002:10 Sep 20 07:55:42 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate WFReportParams --> WFBitMapT Sep 20 07:55:42 dew1 kernel: drbd0: Secondary/Unknown --> Secondary/Primary Sep 20 07:55:42 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate WFConnection --> WFReportParams Sep 20 07:55:42 dew1 kernel: drbd1: Handshake successful: DRBD Network Protocol version 74 Sep 20 07:55:42 dew1 kernel: drbd1: Connection established. Sep 20 07:55:42 dew1 kernel: drbd1: I am(S): 1:00000003:00000001:00000006:00000001:11 Sep 20 07:55:42 dew1 kernel: drbd1: Peer(P): 1:00000003:00000001:00000006:00000002:10 Sep 20 07:55:42 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate WFReportParams --> WFBitMapT Sep 20 07:55:42 dew1 kernel: drbd1: Secondary/Unknown --> Secondary/Primary Sep 20 07:55:42 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate WFBitMapT --> SyncTarget Sep 20 07:55:42 dew1 kernel: drbd0: Resync started as SyncTarget (need to sync 1183752 KB [295938 bits set]). Sep 20 07:55:42 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate WFBitMapT --> SyncTarget Sep 20 07:55:42 dew1 kernel: drbd1: Resync started as SyncTarget (need to sync 1192960 KB [298240 bits set]). Sep 20 07:56:13 dew1 kernel: drbd0: drbd0_asender [3674]: cstate SyncTarget --> NetworkFailure Sep 20 07:56:13 dew1 kernel: drbd0: asender terminated Sep 20 07:56:13 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate NetworkFailure --> BrokenPipe Sep 20 07:56:13 dew1 kernel: drbd0: worker terminated Sep 20 07:56:13 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate BrokenPipe --> Unconnected Sep 20 07:56:13 dew1 kernel: drbd0: Connection lost. Sep 20 07:56:13 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate Unconnected --> WFConnection Sep 20 07:56:13 dew1 kernel: drbd1: drbd1_asender [3675]: cstate SyncTarget --> NetworkFailure Sep 20 07:56:14 dew1 kernel: drbd1: asender terminated Sep 20 07:56:14 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate NetworkFailure --> BrokenPipe Sep 20 07:56:14 dew1 kernel: drbd1: short read receiving data block: read 3904 expected 4096 Sep 20 07:56:14 dew1 kernel: drbd1: worker terminated Sep 20 07:56:14 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate BrokenPipe --> Unconnected Sep 20 07:56:14 dew1 kernel: drbd1: Connection lost. Sep 20 07:56:14 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate Unconnected --> WFConnection Sep 20 07:57:55 dew1 kernel: drbd0: drbdsetup [3805]: cstate WFConnection --> Unconnected Sep 20 07:57:55 dew1 kernel: drbd0: worker terminated Sep 20 07:57:55 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate Unconnected --> StandAlone Sep 20 07:57:55 dew1 kernel: drbd0: Connection lost. Sep 20 07:57:55 dew1 kernel: drbd0: Discarding network configuration. Sep 20 07:57:55 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate StandAlone --> StandAlone Sep 20 07:57:55 dew1 kernel: drbd0: receiver terminated Sep 20 07:57:55 dew1 kernel: drbd0: drbdsetup [3805]: cstate StandAlone --> StandAlone Sep 20 07:57:55 dew1 kernel: drbd0: drbdsetup [3805]: cstate StandAlone --> Unconfigured Sep 20 07:57:55 dew1 kernel: drbd0: worker terminated Sep 20 07:57:55 dew1 kernel: drbd1: drbdsetup [3807]: cstate WFConnection --> Unconnected Sep 20 07:57:55 dew1 kernel: drbd1: worker terminated Sep 20 07:57:55 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate Unconnected --> StandAlone Sep 20 07:57:55 dew1 kernel: drbd1: Connection lost. Sep 20 07:57:55 dew1 kernel: drbd1: Discarding network configuration. Sep 20 07:57:55 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate StandAlone --> StandAlone Sep 20 07:57:55 dew1 kernel: drbd1: receiver terminated Sep 20 07:57:55 dew1 kernel: drbd1: drbdsetup [3807]: cstate StandAlone --> StandAlone Sep 20 07:57:55 dew1 kernel: drbd1: drbdsetup [3807]: cstate StandAlone --> Unconfigured Sep 20 07:57:55 dew1 kernel: drbd1: worker terminated Sep 20 07:57:55 dew1 kernel: drbd: module cleanup done. This is not the only pair of systems I have seen with this hang. In every case the systems are dual Xeon boxes with hyperthreading turned on. I have had this problem with kernels from 2.6.11.7 - 2.6.13.2. Any ideas?