Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I had something strange happen in my drbd testing environment. The machine 'machineA' is the primary and 'machineB' is the secondary. 'machineA' is up 24/7 and 'machineB' is on and off throughout the day. However, the problem I encountered I can't really explain. I am not aware of any process that goes off at 3:29 that would cause this. The machine did not produce this error any day prior to today, and has run with an un-changed configuration for a while. First, the logs from the primary (with comments intermingled): Mar 22 03:29:01 machineA kernel: drbd2: Unable to bind sock2 (-98) Mar 22 03:29:01 machineA kernel: drbd2: drbd2_receiver [17538]: cstate WFConnection --> Unconnected Mar 22 03:29:01 machineA kernel: drbd2: worker terminated Mar 22 03:29:01 machineA kernel: drbd2: drbd2_receiver [17538]: cstate Unconnected --> Unconnected Mar 22 03:29:01 machineA kernel: drbd2: Connection lost. Mar 22 03:29:01 machineA kernel: drbd2: Discarding network configuration. Mar 22 03:29:01 machineA kernel: drbd2: drbd2_receiver [17538]: cstate Unconnected --> StandAlone Mar 22 03:29:01 machineA kernel: drbd2: receiver terminated Around 9:00 the secondary ('machineB') came up but nothing happened on the primary (this is unexpected, the primary should have had data to synchronize). At 9:09:35 I restart drbd on the primary ('machineA'): Mar 22 09:09:35 machineA kernel: drbd2: Primary/Unknown --> Secondary/Unknown Mar 22 09:09:35 machineA kernel: drbd2: drbdsetup [14092]: cstate StandAlone --> Unconnected Mar 22 09:09:35 machineA kernel: drbd2: drbdsetup [14092]: cstate Unconnected --> StandAlone Mar 22 09:09:35 machineA kernel: drbd2: drbdsetup [14092]: cstate StandAlone --> Unconfigured Mar 22 09:09:35 machineA kernel: drbd2: worker terminated Mar 22 09:09:37 machineA kernel: drbd2: resync bitmap: bits=7340032 words=229376 Mar 22 09:09:37 machineA kernel: drbd2: size = 28 GB (29360128 KB) Mar 22 09:09:37 machineA kernel: drbd2: 243 MB marked out-of-sync by on disk bit-map. Mar 22 09:09:37 machineA kernel: drbd2: Found 6 transactions (324 active extents) in activity log. Mar 22 09:09:37 machineA kernel: drbd2: drbdsetup [14136]: cstate Unconfigured --> StandAlone Mar 22 09:09:37 machineA kernel: drbd2: drbdsetup [14142]: cstate StandAlone --> Unconnected Mar 22 09:09:37 machineA kernel: drbd2: drbd2_receiver [14143]: cstate Unconnected --> WFConnection Mar 22 09:09:37 machineA kernel: drbd2: drbd2_receiver [14143]: cstate WFConnection --> WFReportParams Mar 22 09:09:37 machineA kernel: drbd2: Handshake successful: DRBD Network Protocol version 74 Mar 22 09:09:37 machineA kernel: drbd2: Connection established. Mar 22 09:09:37 machineA kernel: drbd2: I am(S): 1:00000007:00000001:0000001d:00000004:00 Mar 22 09:09:37 machineA kernel: drbd2: Peer(S): 1:00000007:00000001:0000001b:00000004:00 Mar 22 09:09:37 machineA kernel: drbd2: drbd2_receiver [14143]: cstate WFReportParams --> WFBitMapS Mar 22 09:09:37 machineA kernel: drbd2: Secondary/Unknown --> Secondary/Secondary Mar 22 09:09:37 machineA kernel: drbd2: drbd2_receiver [14143]: cstate WFBitMapS --> SyncSource Mar 22 09:09:37 machineA kernel: drbd2: Resync started as SyncSource (need to sync 249476 KB [62369 bits set]). Mar 22 09:09:49 machineA kernel: drbd2: Resync done (total 11 sec; paused 0 sec; 22676 K/sec) Mar 22 09:09:49 machineA kernel: drbd2: drbd2_worker [14137]: cstate SyncSource --> Connected and then I had to tell it that it was, in fact, the primary again: Mar 22 09:10:30 machineA kernel: drbd2: Secondary/Secondary --> Primary/Secondary Here are the logs from the secondary: Mar 22 09:01:23 machineB kernel: drbd: initialised. Version: 0.7.22 (api:79/proto:74) Mar 22 09:01:23 machineB kernel: drbd: SVN Revision: 2554 build by lmb at dale, 2006-10-30 22:52:11 Mar 22 09:01:23 machineB kernel: drbd: registered as block device major 147 Mar 22 09:01:24 machineB kernel: drbd0: resync bitmap: bits=7340032 words=229376 Mar 22 09:01:24 machineB kernel: drbd0: size = 28 GB (29360128 KB) Mar 22 09:01:24 machineB kernel: drbd0: 0 KB marked out-of-sync by on disk bit-map. Mar 22 09:01:24 machineB kernel: drbd0: No usable activity log found. Mar 22 09:01:24 machineB kernel: drbd0: drbdsetup [3564]: cstate Unconfigured --> StandAlone Mar 22 09:01:24 machineB kernel: drbd0: drbdsetup [3592]: cstate StandAlone --> Unconnected Mar 22 09:01:24 machineB kernel: drbd0: drbd0_receiver [3593]: cstate Unconnected --> WFConnection Here I manually restart the drbd on the primary ('machineA'). Mar 22 09:09:37 machineB kernel: drbd0: drbd0_receiver [3593]: cstate WFConnection --> WFReportParams Mar 22 09:09:37 machineB kernel: drbd0: Handshake successful: DRBD Network Protocol version 74 Mar 22 09:09:37 machineB kernel: drbd0: Connection established. Mar 22 09:09:37 machineB kernel: drbd0: I am(S): 1:00000007:00000001:0000001b:00000004:00 Mar 22 09:09:37 machineB kernel: drbd0: Peer(S): 1:00000007:00000001:0000001d:00000004:00 Mar 22 09:09:37 machineB kernel: drbd0: drbd0_receiver [3593]: cstate WFReportParams --> WFBitMapT Mar 22 09:09:37 machineB kernel: drbd0: Secondary/Unknown --> Secondary/Secondary Mar 22 09:09:37 machineB kernel: drbd0: drbd0_receiver [3593]: cstate WFBitMapT --> SyncTarget Mar 22 09:09:37 machineB kernel: drbd0: Resync started as SyncTarget (need to sync 249476 KB [62369 bits set]). Mar 22 09:09:49 machineB kernel: drbd0: Resync done (total 11 sec; paused 0 sec; 22676 K/sec) Mar 22 09:09:49 machineB kernel: drbd0: drbd0_worker [3580]: cstate SyncTarget --> Connected Mar 22 09:10:30 machineB kernel: drbd0: Secondary/Secondary --> Secondary/Primary Questions: 0. What caused the transition from 'Primary' to 'Standalone' on 'machineA'? 1. On 'machineA', if it was not the primary, why did it synchronize? 2. Why didn't it move back form 'StandAlone' to 'Primary' when the determination had been made that it's peer was definately already in Secondary? -- Jon Nelson <jnelson-drbd at jamponi.net>