Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello!
Here is a new problem: _sometimes_ drbdadm adjust fails.
Then the master node shows 'primary/unknown', the slave node shows
'secondary/unknown'. It seems their network connection is broken.
(The network works fine! drbd1 and drbd2 are working properly)
Messages at boot time:
Aug 12 11:20:06 localhost kernel: drbd: initialised. Version: 0.7.1
(api:75/proto:74)
Aug 12 11:20:06 localhost kernel: drbd: SVN Revision: 1481M build by
root at castor, 2004-08-03 18:14:36
Aug 12 11:20:06 localhost kernel: drbd: registered as block device major 147
Aug 12 11:20:07 localhost kernel: drbd0: resync bitmap: bits=25602947
words=800094
Aug 12 11:20:07 localhost kernel: drbd0: size = 102411788 KB
Aug 12 11:20:07 localhost kernel: drbd0: 0 KB marked out-of-sync by on
disk bit-map.
Aug 12 11:20:07 localhost kernel: drbd0: Found 6 transactions (324 active
extents) in activity log.
Aug 12 11:20:07 localhost kernel: drbd0: Marked additional 131584 KB as
out-of-sync based on AL.
Aug 12 11:20:08 localhost kernel: drbd0: drbdsetup [1877]: cstate
Unconfigured --> StandAlone
Aug 12 11:20:08 localhost kernel: drbd0: drbdsetup [1879]: cstate
StandAlone --> Unconnected
Aug 12 11:20:08 localhost kernel: drbd0: drbd0_receiver [1880]: cstate
Unconnected --> WFConnection
Aug 12 11:20:08 localhost kernel: drbd1: resync bitmap: bits=25602947
words=800094
Aug 12 11:20:08 localhost kernel: drbd1: size = 102411788 KB
Aug 12 11:20:08 localhost kernel: drbd0: drbd0_receiver [1880]: cstate
WFConnection --> WFReportParams
Aug 12 11:20:08 localhost kernel: drbd0: Handshake successful: DRBD
Network Protocol version 74
Aug 12 11:20:08 localhost kernel: drbd0: Connection established.
Aug 12 11:20:08 localhost kernel: drbd0: I am(S):
1:00000005:0000000b:00000075:0000000b:10
Aug 12 11:20:08 localhost kernel: drbd0: Peer(P):
1:00000005:0000000b:00000074:0000000c:10
Aug 12 11:20:08 localhost kernel: drbd0: drbd0_receiver [1880]: cstate
WFReportParams --> WFBitMapS
Aug 12 11:20:08 localhost kernel: drbd0: sock_sendmsg returned -104
Aug 12 11:20:08 localhost kernel: drbd0: drbd0_receiver [1880]: cstate
WFBitMapS --> BrokenPipe
Aug 12 11:20:08 localhost kernel: drbd0: short sent ReportBitMap size=4096
sent=4064
Aug 12 11:20:08 localhost kernel: drbd0: Secondary/Unknown -->
Secondary/Primary
Aug 12 11:20:08 localhost kernel: drbd0: meta connection shut down by peer.
Aug 12 11:20:08 localhost kernel: drbd0: asender terminated
Aug 12 11:20:08 localhost kernel: drbd0: sock was shut down by peer
Aug 12 11:20:08 localhost kernel: drbd0: drbd0_receiver [1880]: cstate
BrokenPipe --> BrokenPipe
Aug 12 11:20:08 localhost kernel: drbd0: short read expecting header on
sock: r=0
Aug 12 11:20:08 localhost kernel: drbd0: worker terminated
Aug 12 11:20:08 localhost kernel: drbd0: drbd0_receiver [1880]: cstate
BrokenPipe --> Unconnected
Aug 12 11:20:08 localhost kernel: drbd0: Connection lost.
Aug 12 11:20:08 localhost kernel: drbd0: drbd0_receiver [1880]: cstate
Unconnected --> WFConnection
When I force resync with 'drbdadm adjust r0':
Aug 12 11:22:15 localhost kernel: drbd0: drbd0_receiver [1880]: cstate
WFConnection --> WFReportParams
Aug 12 11:22:15 localhost kernel: drbd0: Handshake successful: DRBD
Network Protocol version 74
Aug 12 11:22:15 localhost kernel: drbd0: Connection established.
Aug 12 11:22:15 localhost kernel: drbd0: I am(P):
1:00000005:0000000c:00000075:0000000b:10
Aug 12 11:22:15 localhost kernel: drbd0: Peer(S):
1:00000005:0000000b:00000075:0000000c:00
Aug 12 11:22:15 localhost kernel: drbd0: drbd0_receiver [1880]: cstate
WFReportParams --> WFBitMapS
Aug 12 11:22:15 localhost kernel: drbd0: Primary/Unknown -->
Primary/Secondary
Aug 12 11:22:15 localhost kernel: drbd0: drbd0_receiver [1880]: cstate
WFBitMapS --> SyncSource
Aug 12 11:22:15 localhost kernel: drbd0: Resync started as SyncSource
(need to sync 1052948 KB [263237 bits set])
/etc/drbd.conf:
resource r0 {
protocol B;
incon-degr-cmd "halt -f";
startup {
degr-wfc-timeout 60;
wfc-timeout 30;
}
disk { on-io-error detach; }
net { }
syncer {
rate 100M;
group 1;
al-extents 257;
}
on castor {
device /dev/drbd0;
disk /dev/hda5;
address 192.168.5.1:7788;
meta-disk internal;
}
on pollux {
device /dev/drbd0;
disk /dev/hda5;
address 192.168.5.2:7788;
meta-disk internal;
}
}
[...]
Network configuration: (nothing unusual)
iface eth0 inet static
address 192.168.5.X
netmask 255.255.255.0
mtu 5000
What should I do else than forcing 'drbdadm adjust r0'?
Szabolcs Horvath