Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Nov 05, 2013 at 01:51:25PM -0500, R Johnson wrote: > I am using drbd 0.7.22, 2 nodes with heartbeat on 2 SLES10 SP4 XEN virtual [...] > Please let me know if any other information is required No, I think that is sufficient. You realize that 0.7.22 was end of life years ago? So whatever bugs are in there, they remain. We fixed few bugs over the past seven years ... some of those fixes should be relevant for your issue. We are currently at 8.4.4 (or 8.3.16, if you insist on being conservative), with all new and improved bugs now ;-) Lars > servers; here are the configs: > > 3255:/etc # drbdadm dump all > resource r0 { > protocol C; > incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 > ; halt -f"; > 3255 { > device /dev/drbd0; > disk /dev/DATA/RHAPSODY; > address 172.xx.xx.22:7788; > meta-disk /dev/DATA/DRBD_METADATA [0]; > } > on 3256 { > device /dev/drbd0; > disk /dev/DATA/RHAPSODY; > address 172.xx.xx.33:7788; > meta-disk /dev/DATA/DRBD_METADATA [0]; > } > disk { > on-io-error detach; > } > syncer { > rate 50M; > group 1; > al-extents 257; > } > } > > and > > 3256:/ # drbdadm dump all > resource r0 { > protocol C; > incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 > ; halt -f"; > on 3256 { > device /dev/drbd0; > disk /dev/DATA/RHAPSODY; > address 172.xx.xx.33:7788; > meta-disk /dev/DATA/DRBD_METADATA [0]; > } > on 3255 { > device /dev/drbd0; > disk /dev/_DATA/RHAPSODY; > address 172.xx.xx.22:7788; > meta-disk /dev/DATA/DRBD_METADATA [0]; > } > disk { > on-io-error detach; > } > syncer { > rate 50M; > group 1; > al-extents 257; > } > } > > Everything works fine for a very short while and then the sync's between > the nodes fail with the following: > > 3255:/etc # cat /proc/drbd > version: 0.7.22 (api:79/proto:74) > SVN Revision: 2572 build by lmb at dale, 2006-10-25 18:17:21 > 0: cs:SyncSource st:Primary/Secondary ld:Consistent > ns:63181996 nr:0 dw:62680576 dr:2644442 al:446 bm:585 lo:0 pe:0 ua:0 > ap:0 > [>...................] sync'ed: 0.1% (18100/18100)M > stalled > > 3256:/ # cat /proc/drbd > version: 0.7.22 (api:79/proto:74) > SVN Revision: 2572 build by lmb at dale, 2006-10-25 18:17:21 > 0: cs:SyncTarget st:Secondary/Primary ld:Inconsistent > ns:0 nr:78284 dw:78284 dr:0 al:0 bm:0 lo:0 pe:1280 ua:0 ap:0 > [>...................] sync'ed: 0.1% (18100/18100)M > stalled > > I see the following errors in /var/log/messages: > > Nov 5 11:50:05 w583s3255 kernel: drbd0: drbd0_asender [5880]: cstate > SyncSource --> NetworkFailure > Nov 5 11:50:05 w583s3255 kernel: drbd0: drbd0_receiver [2909]: cstate > NetworkFailure --> Unconnected > Nov 5 11:50:06 w583s3255 kernel: drbd0: Handshake successful: DRBD Network > Protocol version 74 > Nov 5 11:55:08 w583s3255 kernel: drbd0: drbd0_asender [4856]: cstate > SyncSource --> NetworkFailure > Nov 5 11:55:08 w583s3255 kernel: drbd0: drbd0_receiver [2909]: cstate > NetworkFailure --> Unconnected > > > However tcpdump between the servers is active and indicates a healthy > connection state. > > Please let me know if any other information is required > > > RJ -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed