Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I am new to drbd, and I am just getting this set up. I am having a few problems. I am running 8.3 now on CentOS 5.3 64bit version. All latest patches applied. I am getting errors, see way below.. I am just at the initial sync step. nas1 is configured with: 6 x 1TB & 2 x 250 drives, 8gb ram, adaptec 5805 raid card RAID 5 on the 6 drives nas2 is configured with 4 x 1.5TB & 2 x 250 drives, 8gb ram, adaptec 5805 raid card RAID 5 on the 4 drives /data partition is /dev/sdb1 which is a total of 4,200,000 MB (about 4TB) /meta partition is /dev/sdb2 configuration: /etc/drbd.conf uname -a: Linux nas2.mydomainhere.com 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux config: common { protocol C; syncer { rate 60M; al-extents 257; } } resource r0 { handlers { pri-on-incon-degr "halt -f"; } disk { on-io-error detach; } startup { degr-wfc-timeout 120; } on nas1.mydomainhere.com { device /dev/drbd0; disk /dev/sdb1; address XXX.XXX.137.40:7789; meta-disk /dev/sdb2[0]; } on nas2.mydomainhere.com { device /dev/drbd0; disk /dev/sdb1; address XXX.XXX.137.41:7789; meta-disk /dev/sdb2[0]; } } When first starting up, it starts syncing for about 10-15 minutes... (you can see it's down to 3,513,200 left to sync)... # cat /proc/drbd version: 8.3.2 (api:88/proto:86-90) GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by mockbuild at v20z-x86-64.home.local , 2009-08-29 14:07:55 0: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C r---- ns:30511104 nr:0 dw:0 dr:30511104 al:0 bm:1861 lo:0 pe:94 ua:0 ap: 0 ep:1 wo:b oos:3567008704 [>....................] sync'ed: 0.9% (3483404/3513200)M finish: 13:08:27 speed: 75,376 (58,332) K/sec Then, in /var/log/messages, these errors start appearing: Oct 13 10:06:17 nas1 avahi-daemon[3793]: Invalid response packet. Oct 13 10:06:17 nas1 last message repeated 9 times As soon as that starts, then we get all kinds of errors like this (sorry for the long post, trying to be complete)... Oct 13 10:01:18 nas2 kernel: block drbd0: peer 136EDF2D710BB952:E0256B5135655E7D:22CB9163CBBE953B:027F952A21588331 bits:899379200 flags:0 Oct 13 10:01:18 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5 Oct 13 10:01:18 nas2 kernel: block drbd0: Becoming sync target due to disk states. Oct 13 10:01:19 nas2 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 13 10:01:21 nas2 kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID ) Oct 13 10:01:21 nas2 kernel: block drbd0: helper command: /sbin/ drbdadm before-resync-target minor-0 Oct 13 10:01:21 nas2 kernel: block drbd0: helper command: /sbin/ drbdadm before-resync-target minor-0 exit code 0 (0x0) Oct 13 10:01:21 nas2 kernel: block drbd0: conn( WFSyncUUID -> SyncTarget ) Oct 13 10:01:21 nas2 kernel: block drbd0: Began resync as SyncTarget (will sync 3597516800 KB [899379200 bits set]). -- start of problems here.... Oct 13 10:06:17 nas2 avahi-daemon[3971]: Invalid response packet. Oct 13 10:06:17 nas2 last message repeated 9 times Oct 13 10:10:41 nas2 kernel: block drbd0: PingAck did not arrive in time. Oct 13 10:10:41 nas2 kernel: block drbd0: peer( Secondary -> Unknown ) conn( SyncTarget -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Oct 13 10:10:41 nas2 kernel: block drbd0: asender terminated Oct 13 10:10:41 nas2 kernel: block drbd0: Terminating asender thread Oct 13 10:10:41 nas2 kernel: block drbd0: short read receiving data: read 3720 expected 4096 Oct 13 10:10:41 nas2 kernel: block drbd0: error receiving RSDataReply, l: 32792! Oct 13 10:10:41 nas2 kernel: block drbd0: Connection closed Oct 13 10:10:41 nas2 kernel: block drbd0: conn( NetworkFailure -> Unconnected ) Oct 13 10:10:41 nas2 kernel: block drbd0: receiver terminated Oct 13 10:10:41 nas2 kernel: block drbd0: Restarting receiver thread Oct 13 10:10:41 nas2 kernel: block drbd0: receiver (re)started Oct 13 10:10:41 nas2 kernel: block drbd0: conn( Unconnected -> WFConnection ) Oct 13 10:11:37 nas2 kernel: NETDEV WATCHDOG: eth0: transmit timed out Oct 13 10:11:37 nas2 kernel: r8169: eth0: link up Oct 13 10:11:39 nas2 kernel: block drbd0: Handshake successful: Agreed network protocol version 90 Oct 13 10:11:39 nas2 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Oct 13 10:11:39 nas2 kernel: block drbd0: Starting asender thread (from drbd0_receiver [4326]) Oct 13 10:11:39 nas2 kernel: block drbd0: data-integrity-alg: <not-used> Oct 13 10:11:39 nas2 kernel: block drbd0: drbd_sync_handshake: Oct 13 10:11:39 nas2 kernel: block drbd0: self 286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000 bits:891369192 flags:0 Oct 13 10:11:39 nas2 kernel: block drbd0: peer 136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B bits:891369192 flags:0 Oct 13 10:11:39 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5 Oct 13 10:11:39 nas2 kernel: block drbd0: Becoming sync target due to disk states. Oct 13 10:11:39 nas2 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 13 10:11:49 nas2 kernel: block drbd0: PingAck did not arrive in time. Oct 13 10:11:49 nas2 kernel: block drbd0: peer( Secondary -> Unknown ) conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Oct 13 10:11:49 nas2 kernel: block drbd0: asender terminated Oct 13 10:11:49 nas2 kernel: block drbd0: Terminating asender thread Oct 13 10:11:49 nas2 kernel: block drbd0: error receiving ReportBitMap, l: 4088! Oct 13 10:11:49 nas2 kernel: block drbd0: Connection closed Oct 13 10:11:49 nas2 kernel: block drbd0: conn( NetworkFailure -> Unconnected ) Oct 13 10:11:49 nas2 kernel: block drbd0: receiver terminated Oct 13 10:11:49 nas2 kernel: block drbd0: Restarting receiver thread Oct 13 10:11:49 nas2 kernel: block drbd0: receiver (re)started Oct 13 10:11:49 nas2 kernel: block drbd0: conn( Unconnected -> WFConnection ) Oct 13 10:12:18 nas2 kernel: block drbd0: Handshake successful: Agreed network protocol version 90 Oct 13 10:12:18 nas2 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Oct 13 10:12:18 nas2 kernel: block drbd0: Starting asender thread (from drbd0_receiver [4326]) Oct 13 10:12:18 nas2 kernel: block drbd0: data-integrity-alg: <not-used> Oct 13 10:12:18 nas2 kernel: block drbd0: drbd_sync_handshake: Oct 13 10:12:18 nas2 kernel: block drbd0: self 286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000 bits:891369192 flags:0 Oct 13 10:12:18 nas2 kernel: block drbd0: peer 136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B bits:891369192 flags:0 Oct 13 10:12:18 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5 Oct 13 10:12:18 nas2 kernel: block drbd0: Becoming sync target due to disk states. Oct 13 10:12:19 nas2 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 13 10:12:29 nas2 kernel: block drbd0: PingAck did not arrive in time. Oct 13 10:12:29 nas2 kernel: block drbd0: peer( Secondary -> Unknown ) conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Oct 13 10:12:29 nas2 kernel: block drbd0: asender terminated Oct 13 10:12:29 nas2 kernel: block drbd0: Terminating asender thread Oct 13 10:12:29 nas2 kernel: block drbd0: error receiving ReportBitMap, l: 4088! Oct 13 10:12:29 nas2 kernel: block drbd0: Connection closed Oct 13 10:12:29 nas2 kernel: block drbd0: conn( NetworkFailure -> Unconnected ) Oct 13 10:12:29 nas2 kernel: block drbd0: receiver terminated Oct 13 10:12:29 nas2 kernel: block drbd0: Restarting receiver thread Oct 13 10:12:29 nas2 kernel: block drbd0: receiver (re)started Oct 13 10:12:29 nas2 kernel: block drbd0: conn( Unconnected -> WFConnection ) Oct 13 10:12:54 nas2 kernel: block drbd0: Handshake successful: Agreed network protocol version 90 Oct 13 10:12:54 nas2 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Oct 13 10:12:54 nas2 kernel: block drbd0: Starting asender thread (from drbd0_receiver [4326]) Oct 13 10:12:54 nas2 kernel: block drbd0: data-integrity-alg: <not-used> Oct 13 10:12:54 nas2 kernel: block drbd0: drbd_sync_handshake: Oct 13 10:12:54 nas2 kernel: block drbd0: self 286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000 bits:891369192 flags:0 Oct 13 10:12:54 nas2 kernel: block drbd0: peer 136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B bits:891369192 flags:0 Oct 13 10:12:54 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5 Oct 13 10:12:54 nas2 kernel: block drbd0: Becoming sync target due to disk states. Oct 13 10:12:54 nas2 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 13 10:13:05 nas2 kernel: block drbd0: PingAck did not arrive in time. Oct 13 10:13:05 nas2 kernel: block drbd0: peer( Secondary -> Unknown ) conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Oct 13 10:13:05 nas2 kernel: block drbd0: asender terminated Oct 13 10:13:05 nas2 kernel: block drbd0: Terminating asender thread Oct 13 10:13:05 nas2 kernel: block drbd0: error receiving ReportBitMap, l: 4088! Oct 13 10:13:05 nas2 kernel: block drbd0: Connection closed Oct 13 10:13:05 nas2 kernel: block drbd0: conn( NetworkFailure -> Unconnected ) Oct 13 10:13:05 nas2 kernel: block drbd0: receiver terminated Oct 13 10:13:05 nas2 kernel: block drbd0: Restarting receiver thread Oct 13 10:13:05 nas2 kernel: block drbd0: receiver (re)started Oct 13 10:13:05 nas2 kernel: block drbd0: conn( Unconnected -> WFConnection ) Oct 13 10:19:07 nas2 kernel: block drbd0: receiver terminated Oct 13 10:19:07 nas2 kernel: block drbd0: Restarting receiver thread Oct 13 10:19:07 nas2 kernel: block drbd0: receiver (re)started Oct 13 10:19:07 nas2 kernel: block drbd0: conn( Unconnected -> WFConnection ) Oct 13 10:19:37 nas2 kernel: block drbd0: Handshake successful: Agreed network protocol version 90 Oct 13 10:19:37 nas2 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Oct 13 10:19:37 nas2 kernel: block drbd0: Starting asender thread (from drbd0_receiver [4326]) Oct 13 10:19:37 nas2 kernel: block drbd0: data-integrity-alg: <not-used> Oct 13 10:19:37 nas2 kernel: block drbd0: drbd_sync_handshake: Oct 13 10:19:37 nas2 kernel: block drbd0: self 286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000 bits:891369192 flags:0 Oct 13 10:19:37 nas2 kernel: block drbd0: peer 136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B bits:891369192 flags:0 Oct 13 10:19:37 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5 Oct 13 10:19:37 nas2 kernel: block drbd0: Becoming sync target due to disk states. Oct 13 10:19:37 nas2 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 13 10:19:47 nas2 kernel: block drbd0: PingAck did not arrive in time. Oct 13 10:19:47 nas2 kernel: block drbd0: peer( Secondary -> Unknown ) conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Oct 13 10:19:47 nas2 kernel: block drbd0: asender terminated Oct 13 10:19:47 nas2 kernel: block drbd0: Terminating asender thread Oct 13 10:19:47 nas2 kernel: block drbd0: error receiving ReportBitMap, l: 4088! Oct 13 10:19:47 nas2 kernel: block drbd0: Connection closed Oct 13 10:19:47 nas2 kernel: block drbd0: conn( NetworkFailure -> Unconnected ) Oct 13 10:19:47 nas2 kernel: block drbd0: receiver terminated Oct 13 10:19:47 nas2 kernel: block drbd0: Restarting receiver thread Oct 13 10:19:47 nas2 kernel: block drbd0: receiver (re)started Oct 13 10:19:47 nas2 kernel: block drbd0: conn( Unconnected -> WFConnection ) Oct 13 10:20:17 nas2 kernel: block drbd0: Handshake successful: Agreed network protocol version 90 Oct 13 10:20:17 nas2 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Oct 13 10:20:17 nas2 kernel: block drbd0: Starting asender thread (from drbd0_receiver [4326]) Oct 13 10:20:17 nas2 kernel: block drbd0: data-integrity-alg: <not-used> Oct 13 10:20:17 nas2 kernel: block drbd0: drbd_sync_handshake: Oct 13 10:20:17 nas2 kernel: block drbd0: self 286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000 bits:891369192 flags:0 Oct 13 10:20:17 nas2 kernel: block drbd0: peer 136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B bits:891369192 flags:0 Oct 13 10:20:17 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5 Oct 13 10:20:17 nas2 kernel: block drbd0: Becoming sync target due to disk states. Oct 13 10:20:17 nas2 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 13 10:20:27 nas2 kernel: block drbd0: PingAck did not arrive in time. Oct 13 10:20:27 nas2 kernel: block drbd0: peer( Secondary -> Unknown ) conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Oct 13 10:20:27 nas2 kernel: block drbd0: asender terminated Oct 13 10:20:27 nas2 kernel: block drbd0: Terminating asender thread Oct 13 10:20:27 nas2 kernel: block drbd0: error receiving ReportBitMap, l: 4088! Oct 13 10:20:27 nas2 kernel: block drbd0: Connection closed Oct 13 10:20:27 nas2 kernel: block drbd0: conn( NetworkFailure -> Unconnected ) Oct 13 10:20:27 nas2 kernel: block drbd0: receiver terminated Oct 13 10:20:27 nas2 kernel: block drbd0: Restarting receiver thread Oct 13 10:20:27 nas2 kernel: block drbd0: receiver (re)started Oct 13 10:20:27 nas2 kernel: block drbd0: conn( Unconnected -> WFConnection ) Oct 13 10:20:48 nas2 kernel: block drbd0: Handshake successful: Agreed network protocol version 90 Oct 13 10:20:48 nas2 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Oct 13 10:20:48 nas2 kernel: block drbd0: Starting asender thread (from drbd0_receiver [4326]) Oct 13 10:20:48 nas2 kernel: block drbd0: data-integrity-alg: <not-used> Oct 13 10:20:48 nas2 kernel: block drbd0: drbd_sync_handshake: Oct 13 10:20:48 nas2 kernel: block drbd0: self 286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000 bits:891369192 flags:0 Oct 13 10:20:48 nas2 kernel: block drbd0: peer 136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B bits:891369192 flags:0 Oct 13 10:20:48 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5 Oct 13 10:20:48 nas2 kernel: block drbd0: Becoming sync target due to disk states. Oct 13 10:20:48 nas2 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 13 10:20:59 nas2 kernel: block drbd0: PingAck did not arrive in time. Oct 13 10:20:59 nas2 kernel: block drbd0: peer( Secondary -> Unknown ) conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Oct 13 10:20:59 nas2 kernel: block drbd0: asender terminated Oct 13 10:20:59 nas2 kernel: block drbd0: Terminating asender thread Oct 13 10:20:59 nas2 kernel: block drbd0: error receiving ReportBitMap, l: 4088! Oct 13 10:20:59 nas2 kernel: block drbd0: Connection closed Oct 13 10:20:59 nas2 kernel: block drbd0: conn( NetworkFailure -> Unconnected ) Oct 13 10:20:59 nas2 kernel: block drbd0: receiver terminated Oct 13 10:20:59 nas2 kernel: block drbd0: Restarting receiver thread Oct 13 10:20:59 nas2 kernel: block drbd0: receiver (re)started Oct 13 10:20:59 nas2 kernel: block drbd0: conn( Unconnected -> WFConnection ) Oct 13 10:21:17 nas2 avahi-daemon[3971]: Invalid response packet. Oct 13 10:21:17 nas2 last message repeated 4 times At this point, ssh is extremely sluggish on nas2.. to the point it takes 30-60 seconds to type anything... and you'll see it's no longer syncing [root at nas1 ~]# cat /proc/drbd version: 8.3.2 (api:88/proto:86-90) GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by mockbuild at v20z-x86-64.home.local , 2009-08-29 14:07:55 0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/Inconsistent C r---- ns:32040144 nr:0 dw:0 dr:32048320 al:0 bm:1955 lo:0 pe:0 ua:0 ap: 0 ep:1 wo:b oos:3565476768 My only thought is that the switch between the 2 machines is bad, but why would that lockup the machine...? thanks Marc