Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello! Here is a new problem: _sometimes_ drbdadm adjust fails. Then the master node shows 'primary/unknown', the slave node shows 'secondary/unknown'. It seems their network connection is broken. (The network works fine! drbd1 and drbd2 are working properly) Messages at boot time: Aug 12 11:20:06 localhost kernel: drbd: initialised. Version: 0.7.1 (api:75/proto:74) Aug 12 11:20:06 localhost kernel: drbd: SVN Revision: 1481M build by root at castor, 2004-08-03 18:14:36 Aug 12 11:20:06 localhost kernel: drbd: registered as block device major 147 Aug 12 11:20:07 localhost kernel: drbd0: resync bitmap: bits=25602947 words=800094 Aug 12 11:20:07 localhost kernel: drbd0: size = 102411788 KB Aug 12 11:20:07 localhost kernel: drbd0: 0 KB marked out-of-sync by on disk bit-map. Aug 12 11:20:07 localhost kernel: drbd0: Found 6 transactions (324 active extents) in activity log. Aug 12 11:20:07 localhost kernel: drbd0: Marked additional 131584 KB as out-of-sync based on AL. Aug 12 11:20:08 localhost kernel: drbd0: drbdsetup [1877]: cstate Unconfigured --> StandAlone Aug 12 11:20:08 localhost kernel: drbd0: drbdsetup [1879]: cstate StandAlone --> Unconnected Aug 12 11:20:08 localhost kernel: drbd0: drbd0_receiver [1880]: cstate Unconnected --> WFConnection Aug 12 11:20:08 localhost kernel: drbd1: resync bitmap: bits=25602947 words=800094 Aug 12 11:20:08 localhost kernel: drbd1: size = 102411788 KB Aug 12 11:20:08 localhost kernel: drbd0: drbd0_receiver [1880]: cstate WFConnection --> WFReportParams Aug 12 11:20:08 localhost kernel: drbd0: Handshake successful: DRBD Network Protocol version 74 Aug 12 11:20:08 localhost kernel: drbd0: Connection established. Aug 12 11:20:08 localhost kernel: drbd0: I am(S): 1:00000005:0000000b:00000075:0000000b:10 Aug 12 11:20:08 localhost kernel: drbd0: Peer(P): 1:00000005:0000000b:00000074:0000000c:10 Aug 12 11:20:08 localhost kernel: drbd0: drbd0_receiver [1880]: cstate WFReportParams --> WFBitMapS Aug 12 11:20:08 localhost kernel: drbd0: sock_sendmsg returned -104 Aug 12 11:20:08 localhost kernel: drbd0: drbd0_receiver [1880]: cstate WFBitMapS --> BrokenPipe Aug 12 11:20:08 localhost kernel: drbd0: short sent ReportBitMap size=4096 sent=4064 Aug 12 11:20:08 localhost kernel: drbd0: Secondary/Unknown --> Secondary/Primary Aug 12 11:20:08 localhost kernel: drbd0: meta connection shut down by peer. Aug 12 11:20:08 localhost kernel: drbd0: asender terminated Aug 12 11:20:08 localhost kernel: drbd0: sock was shut down by peer Aug 12 11:20:08 localhost kernel: drbd0: drbd0_receiver [1880]: cstate BrokenPipe --> BrokenPipe Aug 12 11:20:08 localhost kernel: drbd0: short read expecting header on sock: r=0 Aug 12 11:20:08 localhost kernel: drbd0: worker terminated Aug 12 11:20:08 localhost kernel: drbd0: drbd0_receiver [1880]: cstate BrokenPipe --> Unconnected Aug 12 11:20:08 localhost kernel: drbd0: Connection lost. Aug 12 11:20:08 localhost kernel: drbd0: drbd0_receiver [1880]: cstate Unconnected --> WFConnection When I force resync with 'drbdadm adjust r0': Aug 12 11:22:15 localhost kernel: drbd0: drbd0_receiver [1880]: cstate WFConnection --> WFReportParams Aug 12 11:22:15 localhost kernel: drbd0: Handshake successful: DRBD Network Protocol version 74 Aug 12 11:22:15 localhost kernel: drbd0: Connection established. Aug 12 11:22:15 localhost kernel: drbd0: I am(P): 1:00000005:0000000c:00000075:0000000b:10 Aug 12 11:22:15 localhost kernel: drbd0: Peer(S): 1:00000005:0000000b:00000075:0000000c:00 Aug 12 11:22:15 localhost kernel: drbd0: drbd0_receiver [1880]: cstate WFReportParams --> WFBitMapS Aug 12 11:22:15 localhost kernel: drbd0: Primary/Unknown --> Primary/Secondary Aug 12 11:22:15 localhost kernel: drbd0: drbd0_receiver [1880]: cstate WFBitMapS --> SyncSource Aug 12 11:22:15 localhost kernel: drbd0: Resync started as SyncSource (need to sync 1052948 KB [263237 bits set]) /etc/drbd.conf: resource r0 { protocol B; incon-degr-cmd "halt -f"; startup { degr-wfc-timeout 60; wfc-timeout 30; } disk { on-io-error detach; } net { } syncer { rate 100M; group 1; al-extents 257; } on castor { device /dev/drbd0; disk /dev/hda5; address 192.168.5.1:7788; meta-disk internal; } on pollux { device /dev/drbd0; disk /dev/hda5; address 192.168.5.2:7788; meta-disk internal; } } [...] Network configuration: (nothing unusual) iface eth0 inet static address 192.168.5.X netmask 255.255.255.0 mtu 5000 What should I do else than forcing 'drbdadm adjust r0'? Szabolcs Horvath