Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello,
I am new to drbd, and I am just getting this set up. I am having a few
problems. I am running 8.3 now on CentOS 5.3 64bit version. All latest
patches applied.
I am getting errors, see way below.. I am just at the initial sync
step.
nas1 is configured with:
6 x 1TB & 2 x 250 drives, 8gb ram, adaptec 5805 raid card RAID 5 on
the 6 drives
nas2 is configured with
4 x 1.5TB & 2 x 250 drives, 8gb ram, adaptec 5805 raid card RAID 5
on the 4 drives
/data partition is /dev/sdb1 which is a total of 4,200,000 MB (about
4TB)
/meta partition is /dev/sdb2
configuration: /etc/drbd.conf
uname -a:
Linux nas2.mydomainhere.com 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30
EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
config:
common {
protocol C;
syncer {
rate 60M;
al-extents 257;
}
}
resource r0 {
handlers {
pri-on-incon-degr "halt -f";
}
disk {
on-io-error detach;
}
startup {
degr-wfc-timeout 120;
}
on nas1.mydomainhere.com {
device /dev/drbd0;
disk /dev/sdb1;
address XXX.XXX.137.40:7789;
meta-disk /dev/sdb2[0];
}
on nas2.mydomainhere.com {
device /dev/drbd0;
disk /dev/sdb1;
address XXX.XXX.137.41:7789;
meta-disk /dev/sdb2[0];
}
}
When first starting up, it starts syncing for about 10-15 minutes...
(you can see it's down to 3,513,200 left to sync)...
# cat /proc/drbd
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by mockbuild at v20z-x86-64.home.local
, 2009-08-29 14:07:55
0: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C
r----
ns:30511104 nr:0 dw:0 dr:30511104 al:0 bm:1861 lo:0 pe:94 ua:0 ap:
0 ep:1 wo:b oos:3567008704
[>....................] sync'ed: 0.9% (3483404/3513200)M
finish: 13:08:27 speed: 75,376 (58,332) K/sec
Then, in /var/log/messages, these errors start appearing:
Oct 13 10:06:17 nas1 avahi-daemon[3793]: Invalid response packet.
Oct 13 10:06:17 nas1 last message repeated 9 times
As soon as that starts, then we get all kinds of errors like this
(sorry for the long post, trying to be complete)...
Oct 13 10:01:18 nas2 kernel: block drbd0: peer
136EDF2D710BB952:E0256B5135655E7D:22CB9163CBBE953B:027F952A21588331
bits:899379200 flags:0
Oct 13 10:01:18 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5
Oct 13 10:01:18 nas2 kernel: block drbd0: Becoming sync target due to
disk states.
Oct 13 10:01:19 nas2 kernel: block drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 13 10:01:21 nas2 kernel: block drbd0: conn( WFBitMapT ->
WFSyncUUID )
Oct 13 10:01:21 nas2 kernel: block drbd0: helper command: /sbin/
drbdadm before-resync-target minor-0
Oct 13 10:01:21 nas2 kernel: block drbd0: helper command: /sbin/
drbdadm before-resync-target minor-0 exit code 0 (0x0)
Oct 13 10:01:21 nas2 kernel: block drbd0: conn( WFSyncUUID ->
SyncTarget )
Oct 13 10:01:21 nas2 kernel: block drbd0: Began resync as SyncTarget
(will sync 3597516800 KB [899379200 bits set]).
-- start of problems here....
Oct 13 10:06:17 nas2 avahi-daemon[3971]: Invalid response packet.
Oct 13 10:06:17 nas2 last message repeated 9 times
Oct 13 10:10:41 nas2 kernel: block drbd0: PingAck did not arrive in
time.
Oct 13 10:10:41 nas2 kernel: block drbd0: peer( Secondary -> Unknown )
conn( SyncTarget -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Oct 13 10:10:41 nas2 kernel: block drbd0: asender terminated
Oct 13 10:10:41 nas2 kernel: block drbd0: Terminating asender thread
Oct 13 10:10:41 nas2 kernel: block drbd0: short read receiving data:
read 3720 expected 4096
Oct 13 10:10:41 nas2 kernel: block drbd0: error receiving RSDataReply,
l: 32792!
Oct 13 10:10:41 nas2 kernel: block drbd0: Connection closed
Oct 13 10:10:41 nas2 kernel: block drbd0: conn( NetworkFailure ->
Unconnected )
Oct 13 10:10:41 nas2 kernel: block drbd0: receiver terminated
Oct 13 10:10:41 nas2 kernel: block drbd0: Restarting receiver thread
Oct 13 10:10:41 nas2 kernel: block drbd0: receiver (re)started
Oct 13 10:10:41 nas2 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Oct 13 10:11:37 nas2 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Oct 13 10:11:37 nas2 kernel: r8169: eth0: link up
Oct 13 10:11:39 nas2 kernel: block drbd0: Handshake successful: Agreed
network protocol version 90
Oct 13 10:11:39 nas2 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Oct 13 10:11:39 nas2 kernel: block drbd0: Starting asender thread
(from drbd0_receiver [4326])
Oct 13 10:11:39 nas2 kernel: block drbd0: data-integrity-alg: <not-used>
Oct 13 10:11:39 nas2 kernel: block drbd0: drbd_sync_handshake:
Oct 13 10:11:39 nas2 kernel: block drbd0: self
286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000
bits:891369192 flags:0
Oct 13 10:11:39 nas2 kernel: block drbd0: peer
136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B
bits:891369192 flags:0
Oct 13 10:11:39 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5
Oct 13 10:11:39 nas2 kernel: block drbd0: Becoming sync target due to
disk states.
Oct 13 10:11:39 nas2 kernel: block drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 13 10:11:49 nas2 kernel: block drbd0: PingAck did not arrive in
time.
Oct 13 10:11:49 nas2 kernel: block drbd0: peer( Secondary -> Unknown )
conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Oct 13 10:11:49 nas2 kernel: block drbd0: asender terminated
Oct 13 10:11:49 nas2 kernel: block drbd0: Terminating asender thread
Oct 13 10:11:49 nas2 kernel: block drbd0: error receiving
ReportBitMap, l: 4088!
Oct 13 10:11:49 nas2 kernel: block drbd0: Connection closed
Oct 13 10:11:49 nas2 kernel: block drbd0: conn( NetworkFailure ->
Unconnected )
Oct 13 10:11:49 nas2 kernel: block drbd0: receiver terminated
Oct 13 10:11:49 nas2 kernel: block drbd0: Restarting receiver thread
Oct 13 10:11:49 nas2 kernel: block drbd0: receiver (re)started
Oct 13 10:11:49 nas2 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Oct 13 10:12:18 nas2 kernel: block drbd0: Handshake successful: Agreed
network protocol version 90
Oct 13 10:12:18 nas2 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Oct 13 10:12:18 nas2 kernel: block drbd0: Starting asender thread
(from drbd0_receiver [4326])
Oct 13 10:12:18 nas2 kernel: block drbd0: data-integrity-alg: <not-used>
Oct 13 10:12:18 nas2 kernel: block drbd0: drbd_sync_handshake:
Oct 13 10:12:18 nas2 kernel: block drbd0: self
286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000
bits:891369192 flags:0
Oct 13 10:12:18 nas2 kernel: block drbd0: peer
136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B
bits:891369192 flags:0
Oct 13 10:12:18 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5
Oct 13 10:12:18 nas2 kernel: block drbd0: Becoming sync target due to
disk states.
Oct 13 10:12:19 nas2 kernel: block drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 13 10:12:29 nas2 kernel: block drbd0: PingAck did not arrive in
time.
Oct 13 10:12:29 nas2 kernel: block drbd0: peer( Secondary -> Unknown )
conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Oct 13 10:12:29 nas2 kernel: block drbd0: asender terminated
Oct 13 10:12:29 nas2 kernel: block drbd0: Terminating asender thread
Oct 13 10:12:29 nas2 kernel: block drbd0: error receiving
ReportBitMap, l: 4088!
Oct 13 10:12:29 nas2 kernel: block drbd0: Connection closed
Oct 13 10:12:29 nas2 kernel: block drbd0: conn( NetworkFailure ->
Unconnected )
Oct 13 10:12:29 nas2 kernel: block drbd0: receiver terminated
Oct 13 10:12:29 nas2 kernel: block drbd0: Restarting receiver thread
Oct 13 10:12:29 nas2 kernel: block drbd0: receiver (re)started
Oct 13 10:12:29 nas2 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Oct 13 10:12:54 nas2 kernel: block drbd0: Handshake successful: Agreed
network protocol version 90
Oct 13 10:12:54 nas2 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Oct 13 10:12:54 nas2 kernel: block drbd0: Starting asender thread
(from drbd0_receiver [4326])
Oct 13 10:12:54 nas2 kernel: block drbd0: data-integrity-alg: <not-used>
Oct 13 10:12:54 nas2 kernel: block drbd0: drbd_sync_handshake:
Oct 13 10:12:54 nas2 kernel: block drbd0: self
286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000
bits:891369192 flags:0
Oct 13 10:12:54 nas2 kernel: block drbd0: peer
136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B
bits:891369192 flags:0
Oct 13 10:12:54 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5
Oct 13 10:12:54 nas2 kernel: block drbd0: Becoming sync target due to
disk states.
Oct 13 10:12:54 nas2 kernel: block drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 13 10:13:05 nas2 kernel: block drbd0: PingAck did not arrive in
time.
Oct 13 10:13:05 nas2 kernel: block drbd0: peer( Secondary -> Unknown )
conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Oct 13 10:13:05 nas2 kernel: block drbd0: asender terminated
Oct 13 10:13:05 nas2 kernel: block drbd0: Terminating asender thread
Oct 13 10:13:05 nas2 kernel: block drbd0: error receiving
ReportBitMap, l: 4088!
Oct 13 10:13:05 nas2 kernel: block drbd0: Connection closed
Oct 13 10:13:05 nas2 kernel: block drbd0: conn( NetworkFailure ->
Unconnected )
Oct 13 10:13:05 nas2 kernel: block drbd0: receiver terminated
Oct 13 10:13:05 nas2 kernel: block drbd0: Restarting receiver thread
Oct 13 10:13:05 nas2 kernel: block drbd0: receiver (re)started
Oct 13 10:13:05 nas2 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Oct 13 10:19:07 nas2 kernel: block drbd0: receiver terminated
Oct 13 10:19:07 nas2 kernel: block drbd0: Restarting receiver thread
Oct 13 10:19:07 nas2 kernel: block drbd0: receiver (re)started
Oct 13 10:19:07 nas2 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Oct 13 10:19:37 nas2 kernel: block drbd0: Handshake successful: Agreed
network protocol version 90
Oct 13 10:19:37 nas2 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Oct 13 10:19:37 nas2 kernel: block drbd0: Starting asender thread
(from drbd0_receiver [4326])
Oct 13 10:19:37 nas2 kernel: block drbd0: data-integrity-alg: <not-used>
Oct 13 10:19:37 nas2 kernel: block drbd0: drbd_sync_handshake:
Oct 13 10:19:37 nas2 kernel: block drbd0: self
286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000
bits:891369192 flags:0
Oct 13 10:19:37 nas2 kernel: block drbd0: peer
136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B
bits:891369192 flags:0
Oct 13 10:19:37 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5
Oct 13 10:19:37 nas2 kernel: block drbd0: Becoming sync target due to
disk states.
Oct 13 10:19:37 nas2 kernel: block drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 13 10:19:47 nas2 kernel: block drbd0: PingAck did not arrive in
time.
Oct 13 10:19:47 nas2 kernel: block drbd0: peer( Secondary -> Unknown )
conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Oct 13 10:19:47 nas2 kernel: block drbd0: asender terminated
Oct 13 10:19:47 nas2 kernel: block drbd0: Terminating asender thread
Oct 13 10:19:47 nas2 kernel: block drbd0: error receiving
ReportBitMap, l: 4088!
Oct 13 10:19:47 nas2 kernel: block drbd0: Connection closed
Oct 13 10:19:47 nas2 kernel: block drbd0: conn( NetworkFailure ->
Unconnected )
Oct 13 10:19:47 nas2 kernel: block drbd0: receiver terminated
Oct 13 10:19:47 nas2 kernel: block drbd0: Restarting receiver thread
Oct 13 10:19:47 nas2 kernel: block drbd0: receiver (re)started
Oct 13 10:19:47 nas2 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Oct 13 10:20:17 nas2 kernel: block drbd0: Handshake successful: Agreed
network protocol version 90
Oct 13 10:20:17 nas2 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Oct 13 10:20:17 nas2 kernel: block drbd0: Starting asender thread
(from drbd0_receiver [4326])
Oct 13 10:20:17 nas2 kernel: block drbd0: data-integrity-alg: <not-used>
Oct 13 10:20:17 nas2 kernel: block drbd0: drbd_sync_handshake:
Oct 13 10:20:17 nas2 kernel: block drbd0: self
286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000
bits:891369192 flags:0
Oct 13 10:20:17 nas2 kernel: block drbd0: peer
136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B
bits:891369192 flags:0
Oct 13 10:20:17 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5
Oct 13 10:20:17 nas2 kernel: block drbd0: Becoming sync target due to
disk states.
Oct 13 10:20:17 nas2 kernel: block drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 13 10:20:27 nas2 kernel: block drbd0: PingAck did not arrive in
time.
Oct 13 10:20:27 nas2 kernel: block drbd0: peer( Secondary -> Unknown )
conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Oct 13 10:20:27 nas2 kernel: block drbd0: asender terminated
Oct 13 10:20:27 nas2 kernel: block drbd0: Terminating asender thread
Oct 13 10:20:27 nas2 kernel: block drbd0: error receiving
ReportBitMap, l: 4088!
Oct 13 10:20:27 nas2 kernel: block drbd0: Connection closed
Oct 13 10:20:27 nas2 kernel: block drbd0: conn( NetworkFailure ->
Unconnected )
Oct 13 10:20:27 nas2 kernel: block drbd0: receiver terminated
Oct 13 10:20:27 nas2 kernel: block drbd0: Restarting receiver thread
Oct 13 10:20:27 nas2 kernel: block drbd0: receiver (re)started
Oct 13 10:20:27 nas2 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Oct 13 10:20:48 nas2 kernel: block drbd0: Handshake successful: Agreed
network protocol version 90
Oct 13 10:20:48 nas2 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Oct 13 10:20:48 nas2 kernel: block drbd0: Starting asender thread
(from drbd0_receiver [4326])
Oct 13 10:20:48 nas2 kernel: block drbd0: data-integrity-alg: <not-used>
Oct 13 10:20:48 nas2 kernel: block drbd0: drbd_sync_handshake:
Oct 13 10:20:48 nas2 kernel: block drbd0: self
286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000
bits:891369192 flags:0
Oct 13 10:20:48 nas2 kernel: block drbd0: peer
136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B
bits:891369192 flags:0
Oct 13 10:20:48 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5
Oct 13 10:20:48 nas2 kernel: block drbd0: Becoming sync target due to
disk states.
Oct 13 10:20:48 nas2 kernel: block drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 13 10:20:59 nas2 kernel: block drbd0: PingAck did not arrive in
time.
Oct 13 10:20:59 nas2 kernel: block drbd0: peer( Secondary -> Unknown )
conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Oct 13 10:20:59 nas2 kernel: block drbd0: asender terminated
Oct 13 10:20:59 nas2 kernel: block drbd0: Terminating asender thread
Oct 13 10:20:59 nas2 kernel: block drbd0: error receiving
ReportBitMap, l: 4088!
Oct 13 10:20:59 nas2 kernel: block drbd0: Connection closed
Oct 13 10:20:59 nas2 kernel: block drbd0: conn( NetworkFailure ->
Unconnected )
Oct 13 10:20:59 nas2 kernel: block drbd0: receiver terminated
Oct 13 10:20:59 nas2 kernel: block drbd0: Restarting receiver thread
Oct 13 10:20:59 nas2 kernel: block drbd0: receiver (re)started
Oct 13 10:20:59 nas2 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Oct 13 10:21:17 nas2 avahi-daemon[3971]: Invalid response packet.
Oct 13 10:21:17 nas2 last message repeated 4 times
At this point, ssh is extremely sluggish on nas2.. to the point it
takes 30-60 seconds to type anything...
and you'll see it's no longer syncing
[root at nas1 ~]# cat /proc/drbd
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by mockbuild at v20z-x86-64.home.local
, 2009-08-29 14:07:55
0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/Inconsistent C
r----
ns:32040144 nr:0 dw:0 dr:32048320 al:0 bm:1955 lo:0 pe:0 ua:0 ap:
0 ep:1 wo:b oos:3565476768
My only thought is that the switch between the 2 machines is bad, but
why would that lockup the machine...?
thanks
Marc