[DRBD-user] [0.7.23] reconnect problem after link loss
Lukasz Engel
lukasz.engel at softax.com.pl
Tue Apr 24 12:10:37 CEST 2007
I have 2 machines running drdb 0.7.23 (self compiled) with configured 5
drdbX resources (and heartbeat running above),
drbd uses direct cross-over cable for synchronization. Kernel 2.6.19.2
(vendor kernel - trustix 3) UP.
Today I disconnected and connected direct cable and after that 2 of 5
drbds was failing to reconnect:
drbd0,2,4 successuly connected
drbd1 on secondary blocked in NetworkFailure state (WFConnection on
primary)
drbd3 was retrying to reconnect, but could not succeed (always went to
BrokenPipe after WFReportParams)
drbdadm down/up for both failed devices helped
full scenario:
start all drbdN are Connected Primary/Secondary (or Secondary/Primary)
11:20:18 link disconnected
11:21:51 link connected -> drbd0,2,4 reconnected, drbd1,3 didn't
11:24 heartbeat shutdown for gauss1 (secondary for drbd1,3) (I wasn't
sure if I had to shutdown whole drbd on the node)
11:28 drbdadm down/up www (drbd1) on gauss1 -> after that drbd1
connected
11:29 drbdadm down/up dbdata (drbd3) on gauss2 -> after that drbd3
connected
(I already observed similar problem some time ago, but it is not 100%
repeatable, I cannot repeat it second time today)
/proc/drbd from both machines (taken before heartbeat shutdown on gauss1):
root at gauss1 ~# cat /proc/drbd
version: 0.7.23 (api:79/proto:74)
SVN Revision: 2686 build by root at gauss1.softax.local, 2007-02-01 00:22:23
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:231268 nr:8 dw:231280 dr:3255883 al:1 bm:265 lo:0 pe:0 ua:0 ap:0
1: cs:NetworkFailure st:Secondary/Primary ld:Consistent
ns:876 nr:1863628 dw:1864504 dr:1329 al:5 bm:645 lo:0 pe:0 ua:0 ap:0
2: cs:Connected st:Primary/Secondary ld:Consistent
ns:86870212 nr:199645112 dw:286551572 dr:1036641651 al:1186444
bm:2615 lo:0 pe:0 ua:0 ap:0
3: cs:BrokenPipe st:Secondary/Unknown ld:Consistent
ns:16260 nr:33465888 dw:33482256 dr:80785 al:61 bm:1014 lo:0 pe:0
ua:0 ap:0
4: cs:Connected st:Secondary/Primary ld:Consistent
ns:37 nr:1430 dw:1454 dr:1257 al:0 bm:93 lo:0 pe:0 ua:0 ap:0
-------------------
root at gauss2 ~# cat /proc/drbd
version: 0.7.23 (api:79/proto:74)
SVN Revision: 2686 build by root at gauss2.softax.local, 2007-01-31 17:12:23
0: cs:Connected st:Secondary/Primary ld:Consistent
ns:0 nr:9820 dw:9820 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
1: cs:WFConnection st:Primary/Unknown ld:Consistent
ns:32900 nr:844 dw:33744 dr:259281 al:0 bm:5 lo:0 pe:0 ua:0 ap:0
2: cs:Connected st:Secondary/Primary ld:Consistent
ns:0 nr:18973748 dw:18973748 dr:0 al:0 bm:94 lo:0 pe:0 ua:0 ap:0
3: cs:WFConnection st:Primary/Unknown ld:Consistent
ns:3721668 nr:1880 dw:3725040 dr:390077 al:126 bm:0 lo:0 pe:0 ua:0 ap:0
4: cs:Connected st:Primary/Secondary ld:Consistent
ns:6 nr:3 dw:9 dr:751 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
---------------------
Config and logs from both machines are attached
--
Lukasz Engel
-------------- next part --------------
#
# Comment lines.
#
# global {
# this is for people which set up a drbd device via the
# loopback network interface or between two VMs on the same
# box, for testing/simulating/presentation
# otherwise it could trigger a run_tasq_queue deadlock.
# disable_io_hints
# }
#
# this need not be drbd#, you may use phony resource names,
# like "resource web" or "resource mail", too
#
resource news {
protocol C;
incon-degr-cmd "halt -f";
startup {
wfc-timeout 1800;
degr-wfc-timeout 120;
}
net {
timeout 60;
connect-int 10;
ping-int 10;
}
syncer {
group 0;
rate 40960k;
}
# disk {
# on-io-error
# }
on gauss1.softax.local {
device /dev/drbd0;
disk /dev/evms/lvm/vgmirror/news;
address 192.168.5.2:7780;
meta-disk /dev/evms/lvm/vgmirror/drbd_news_meta[0];
}
on gauss2.softax.local {
device /dev/drbd0;
disk /dev/evms/lvm/vgmirror/news;
address 192.168.5.3:7780;
meta-disk /dev/evms/lvm/vgmirror/drbd_news_meta[0];
}
}
resource www {
protocol C;
incon-degr-cmd "halt -f";
startup {
wfc-timeout 1800;
degr-wfc-timeout 120;
}
net {
timeout 60;
connect-int 10;
ping-int 10;
}
syncer {
group 1;
rate 40960k;
}
# disk {
# on-io-error
# }
on gauss1.softax.local {
device /dev/drbd1;
disk /dev/evms/lvm/vgmirror/www;
address 192.168.5.2:7781;
meta-disk /dev/evms/lvm/vgmirror/drbd_www_meta[0];
}
on gauss2.softax.local {
device /dev/drbd1;
disk /dev/evms/lvm/vgmirror/www;
address 192.168.5.3:7781;
meta-disk /dev/evms/lvm/vgmirror/drbd_www_meta[0];
}
}
resource cvs {
protocol C;
incon-degr-cmd "halt -f";
startup {
wfc-timeout 1800;
degr-wfc-timeout 120;
}
net {
timeout 60;
connect-int 10;
ping-int 10;
}
syncer {
group 2;
rate 40960k;
}
# disk {
# on-io-error
# }
on gauss1.softax.local {
device /dev/drbd2;
disk /dev/evms/lvm/vgmirror/cvs;
address 192.168.5.2:7782;
meta-disk /dev/evms/lvm/vgmirror/drbd_cvs_meta[0];
}
on gauss2.softax.local {
device /dev/drbd2;
disk /dev/evms/lvm/vgmirror/cvs;
address 192.168.5.3:7782;
meta-disk /dev/evms/lvm/vgmirror/drbd_cvs_meta[0];
}
}
resource dbdata {
protocol C;
incon-degr-cmd "halt -f";
startup {
wfc-timeout 1800;
degr-wfc-timeout 120;
}
net {
timeout 60;
connect-int 10;
ping-int 10;
}
syncer {
group 3;
rate 40960k;
}
# disk {
# on-io-error
# }
on gauss1.softax.local {
device /dev/drbd3;
disk /dev/evms/lvm/vgmirror/dbdata;
address 192.168.5.2:7783;
meta-disk /dev/evms/lvm/vgmirror/drbd_dbdata_meta[0];
}
on gauss2.softax.local {
device /dev/drbd3;
disk /dev/evms/lvm/vgmirror/dbdata;
address 192.168.5.3:7783;
meta-disk /dev/evms/lvm/vgmirror/drbd_dbdata_meta[0];
}
}
resource ldap {
protocol C;
incon-degr-cmd "halt -f";
startup {
wfc-timeout 1800;
degr-wfc-timeout 120;
}
net {
timeout 60;
connect-int 10;
ping-int 10;
}
syncer {
group 4;
rate 40960k;
}
# disk {
# on-io-error
# }
on gauss1.softax.local {
device /dev/drbd4;
disk /dev/evms/lvm/vgmirror/ldap;
address 192.168.5.2:7784;
meta-disk /dev/evms/lvm/vgmirror/drbd_ldap_meta[0];
}
on gauss2.softax.local {
device /dev/drbd4;
disk /dev/evms/lvm/vgmirror/ldap;
address 192.168.5.3:7784;
meta-disk /dev/evms/lvm/vgmirror/drbd_ldap_meta[0];
}
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gauss1.log.gz
Type: application/gzip
Size: 3027 bytes
Desc: not available
Url : http://lists.linbit.com/pipermail/drbd-user/attachments/20070424/2942b630/gauss1.log.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gauss2.log.gz
Type: application/gzip
Size: 2645 bytes
Desc: not available
Url : http://lists.linbit.com/pipermail/drbd-user/attachments/20070424/2942b630/gauss2.log.bin
More information about the drbd-user
mailing list