[DRBD-user] kernel hang problem with 0.7.13

Harry Edmon harry at atmos.washington.edu
Tue Sep 20 17:33:06 CEST 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Sorry if this appears twice - I sent it the first time from the wrong 
e-mail address.

I have a kernel hang problem with drbd 0.7.13.  Unfortunately it is a 
hard hang, so I have no kernel traces.  The problem occurs when I have 
one system up as the primary under 0.7.13, and I then bring up a second 
system running 0.7.13 as a secondary and it starts to sync.  When the 
secondary sync starts up the primary hangs.  Under 0.7.11 the problem 
does not happen. Here are the kernel messages from the primary right 
before the hang:

Sep 20 07:55:42 dew2 kernel: drbd0: drbd0_receiver [3762]: cstate 
WFConnection --> WFReportParams
Sep 20 07:55:42 dew2 kernel: drbd0: Handshake successful: DRBD Network 
Protocol version 74
Sep 20 07:55:42 dew2 kernel: drbd0: Connection established.
Sep 20 07:55:42 dew2 kernel: drbd0: I am(P): 
1:00000003:00000001:00000006:00000002:10
Sep 20 07:55:42 dew2 kernel: drbd1: drbd1_receiver [3770]: cstate 
WFConnection --> WFReportParams
Sep 20 07:55:42 dew2 kernel: drbd1: Handshake successful: DRBD Network 
Protocol version 74
Sep 20 07:55:42 dew2 kernel: drbd1: Connection established.
Sep 20 07:55:42 dew2 kernel: drbd1: I am(P): 
1:00000003:00000001:00000006:00000002:10
Sep 20 07:55:42 dew2 kernel: drbd1: Peer(S): 
1:00000003:00000001:00000006:00000001:11
Sep 20 07:55:42 dew2 kernel: drbd1: drbd1_receiver [3770]: cstate 
WFReportParams --> WFBitMapS
Sep 20 07:55:42 dew2 kernel: drbd0: Peer(S): 
1:00000003:00000001:00000006:00000001:11
Sep 20 07:55:42 dew2 kernel: drbd0: drbd0_receiver [3762]: cstate 
WFReportParams --> WFBitMapSSep 20 07:55:42 dew2 kernel: drbd0: 
Primary/Unknown --> Primary/Secondary
Sep 20 07:55:42 dew2 kernel: drbd0: drbd0_receiver [3762]: cstate 
WFBitMapS --> SyncSource
Sep 20 07:55:42 dew2 kernel: drbd0: Resync started as SyncSource (need 
to sync 1183752 KB [295938 bits set]).
Sep 20 07:55:42 dew2 kernel: drbd1: Primary/Unknown --> Primary/Secondary
Sep 20 07:55:42 dew2 kernel: drbd1: drbd1_receiver [3770]: cstate 
WFBitMapS --> SyncSource
Sep 20 07:55:42 dew2 kernel: drbd1: Resync started as SyncSource (need 
to sync 1192960 KB [298240 bits set]).

No other messages show up on the primary - it hangs here.

Here is what the secondary shows:

Sep 20 07:55:30 dew1 kernel: drbd: initialised. Version: 0.7.13 
(api:77/proto:74)
Sep 20 07:55:30 dew1 kernel: drbd: SVN Revision: 1961 build by 
root at dew1, 2005-09-18 19:52:12
Sep 20 07:55:38 dew1 kernel: drbd0: resync bitmap: bits=15030177 
words=469694
Sep 20 07:55:38 dew1 kernel: drbd0: size = 57 GB (60120708 KB)
Sep 20 07:55:38 dew1 kernel: drbd0: 0 KB marked out-of-sync by on disk 
bit-map.
Sep 20 07:55:38 dew1 kernel: drbd0: Found 6 transactions (324 active 
extents) in activity log.
Sep 20 07:55:38 dew1 kernel: drbd0: Marked additional 128 MB as 
out-of-sync based on AL.Sep 20 07:55:40 dew1 kernel: drbd0: drbdsetup 
[3615]: cstate Unconfigured --> StandAlone
Sep 20 07:55:40 dew1 kernel: drbd1: resync bitmap: bits=43023440 
words=1344484
Sep 20 07:55:40 dew1 kernel: drbd1: size = 164 GB (172093760 KB)
Sep 20 07:55:41 dew1 kernel: drbd1: 0 KB marked out-of-sync by on disk 
bit-map.
Sep 20 07:55:41 dew1 kernel: drbd1: Found 6 transactions (324 active 
extents) in activity log.
Sep 20 07:55:41 dew1 kernel: drbd1: Marked additional 128 MB as 
out-of-sync based on AL.
Sep 20 07:55:42 dew1 kernel: drbd1: drbdsetup [3630]: cstate 
Unconfigured --> StandAlone
Sep 20 07:55:42 dew1 kernel: drbd0: drbdsetup [3648]: cstate StandAlone 
--> Unconnected
Sep 20 07:55:42 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate 
Unconnected --> WFConnection
Sep 20 07:55:42 dew1 kernel: drbd1: drbdsetup [3656]: cstate StandAlone 
--> Unconnected
Sep 20 07:55:42 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate 
Unconnected --> WFConnection
Sep 20 07:55:42 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate 
WFConnection --> WFReportParams
Sep 20 07:55:42 dew1 kernel: drbd0: Handshake successful: DRBD Network 
Protocol version 74
Sep 20 07:55:42 dew1 kernel: drbd0: Connection established.
Sep 20 07:55:42 dew1 kernel: drbd0: I am(S): 
1:00000003:00000001:00000006:00000001:11
Sep 20 07:55:42 dew1 kernel: drbd0: Peer(P): 
1:00000003:00000001:00000006:00000002:10
Sep 20 07:55:42 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate 
WFReportParams --> WFBitMapT
Sep 20 07:55:42 dew1 kernel: drbd0: Secondary/Unknown --> Secondary/Primary
Sep 20 07:55:42 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate 
WFConnection --> WFReportParams
Sep 20 07:55:42 dew1 kernel: drbd1: Handshake successful: DRBD Network 
Protocol version 74
Sep 20 07:55:42 dew1 kernel: drbd1: Connection established.
Sep 20 07:55:42 dew1 kernel: drbd1: I am(S): 
1:00000003:00000001:00000006:00000001:11
Sep 20 07:55:42 dew1 kernel: drbd1: Peer(P): 
1:00000003:00000001:00000006:00000002:10
Sep 20 07:55:42 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate 
WFReportParams --> WFBitMapT
Sep 20 07:55:42 dew1 kernel: drbd1: Secondary/Unknown --> Secondary/Primary
Sep 20 07:55:42 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate 
WFBitMapT --> SyncTarget
Sep 20 07:55:42 dew1 kernel: drbd0: Resync started as SyncTarget (need 
to sync 1183752 KB [295938 bits set]).
Sep 20 07:55:42 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate 
WFBitMapT --> SyncTarget
Sep 20 07:55:42 dew1 kernel: drbd1: Resync started as SyncTarget (need 
to sync 1192960 KB [298240 bits set]).
Sep 20 07:56:13 dew1 kernel: drbd0: drbd0_asender [3674]: cstate 
SyncTarget --> NetworkFailure
Sep 20 07:56:13 dew1 kernel: drbd0: asender terminated
Sep 20 07:56:13 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate 
NetworkFailure --> BrokenPipe
Sep 20 07:56:13 dew1 kernel: drbd0: worker terminated
Sep 20 07:56:13 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate 
BrokenPipe --> Unconnected
Sep 20 07:56:13 dew1 kernel: drbd0: Connection lost.
Sep 20 07:56:13 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate 
Unconnected --> WFConnection
Sep 20 07:56:13 dew1 kernel: drbd1: drbd1_asender [3675]: cstate 
SyncTarget --> NetworkFailure
Sep 20 07:56:14 dew1 kernel: drbd1: asender terminated
Sep 20 07:56:14 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate 
NetworkFailure --> BrokenPipe
Sep 20 07:56:14 dew1 kernel: drbd1: short read receiving data block: 
read 3904 expected 4096
Sep 20 07:56:14 dew1 kernel: drbd1: worker terminated
Sep 20 07:56:14 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate 
BrokenPipe --> Unconnected
Sep 20 07:56:14 dew1 kernel: drbd1: Connection lost.
Sep 20 07:56:14 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate 
Unconnected --> WFConnection
Sep 20 07:57:55 dew1 kernel: drbd0: drbdsetup [3805]: cstate 
WFConnection --> Unconnected
Sep 20 07:57:55 dew1 kernel: drbd0: worker terminated
Sep 20 07:57:55 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate 
Unconnected --> StandAlone
Sep 20 07:57:55 dew1 kernel: drbd0: Connection lost.
Sep 20 07:57:55 dew1 kernel: drbd0: Discarding network configuration.
Sep 20 07:57:55 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate 
StandAlone --> StandAlone
Sep 20 07:57:55 dew1 kernel: drbd0: receiver terminated
Sep 20 07:57:55 dew1 kernel: drbd0: drbdsetup [3805]: cstate StandAlone 
--> StandAlone
Sep 20 07:57:55 dew1 kernel: drbd0: drbdsetup [3805]: cstate StandAlone 
--> Unconfigured
Sep 20 07:57:55 dew1 kernel: drbd0: worker terminated
Sep 20 07:57:55 dew1 kernel: drbd1: drbdsetup [3807]: cstate 
WFConnection --> Unconnected
Sep 20 07:57:55 dew1 kernel: drbd1: worker terminated
Sep 20 07:57:55 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate 
Unconnected --> StandAlone
Sep 20 07:57:55 dew1 kernel: drbd1: Connection lost.
Sep 20 07:57:55 dew1 kernel: drbd1: Discarding network configuration.
Sep 20 07:57:55 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate 
StandAlone --> StandAlone
Sep 20 07:57:55 dew1 kernel: drbd1: receiver terminated
Sep 20 07:57:55 dew1 kernel: drbd1: drbdsetup [3807]: cstate StandAlone 
--> StandAlone
Sep 20 07:57:55 dew1 kernel: drbd1: drbdsetup [3807]: cstate StandAlone 
--> Unconfigured
Sep 20 07:57:55 dew1 kernel: drbd1: worker terminated
Sep 20 07:57:55 dew1 kernel: drbd: module cleanup done.

This is not the only pair of systems I have seen with this hang.  In 
every case the systems are dual Xeon boxes with hyperthreading turned 
on.  I have had this problem with kernels from 2.6.11.7 - 2.6.13.2.  Any 
ideas?





More information about the drbd-user mailing list