[DRBD-user] [0.7.23] reconnect problem after link loss

Lukasz Engel lukasz.engel at softax.com.pl
Tue Apr 24 12:10:37 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I have 2 machines running drdb 0.7.23 (self compiled) with configured 5 
drdbX resources (and heartbeat running above),
drbd uses direct cross-over cable for synchronization. Kernel 2.6.19.2 
(vendor kernel - trustix 3) UP.

Today I disconnected and connected direct cable and after that 2 of 5 
drbds was failing to reconnect:
drbd0,2,4 successuly connected
drbd1 on secondary blocked in NetworkFailure state (WFConnection on 
primary)
drbd3 was retrying to reconnect, but could not succeed (always went to 
BrokenPipe after WFReportParams)

drbdadm down/up for both failed devices helped

full scenario:

start    all drbdN are Connected Primary/Secondary (or Secondary/Primary)
11:20:18 link disconnected
11:21:51 link connected -> drbd0,2,4 reconnected, drbd1,3 didn't
11:24    heartbeat shutdown for gauss1 (secondary for drbd1,3) (I wasn't 
sure if I had to shutdown whole drbd on the node)
11:28    drbdadm down/up www (drbd1) on gauss1 -> after that drbd1 
connected
11:29    drbdadm down/up dbdata (drbd3) on gauss2 -> after that drbd3 
connected

(I already observed similar problem some time ago, but it is not 100% 
repeatable, I cannot repeat it second time today)

/proc/drbd from both machines (taken before heartbeat shutdown on gauss1):

root at gauss1 ~# cat /proc/drbd
version: 0.7.23 (api:79/proto:74)
SVN Revision: 2686 build by root at gauss1.softax.local, 2007-02-01 00:22:23
0: cs:Connected st:Primary/Secondary ld:Consistent
   ns:231268 nr:8 dw:231280 dr:3255883 al:1 bm:265 lo:0 pe:0 ua:0 ap:0
1: cs:NetworkFailure st:Secondary/Primary ld:Consistent
   ns:876 nr:1863628 dw:1864504 dr:1329 al:5 bm:645 lo:0 pe:0 ua:0 ap:0
2: cs:Connected st:Primary/Secondary ld:Consistent
   ns:86870212 nr:199645112 dw:286551572 dr:1036641651 al:1186444 
bm:2615 lo:0 pe:0 ua:0 ap:0
3: cs:BrokenPipe st:Secondary/Unknown ld:Consistent
   ns:16260 nr:33465888 dw:33482256 dr:80785 al:61 bm:1014 lo:0 pe:0 
ua:0 ap:0
4: cs:Connected st:Secondary/Primary ld:Consistent
   ns:37 nr:1430 dw:1454 dr:1257 al:0 bm:93 lo:0 pe:0 ua:0 ap:0
-------------------
root at gauss2 ~# cat /proc/drbd
version: 0.7.23 (api:79/proto:74)
SVN Revision: 2686 build by root at gauss2.softax.local, 2007-01-31 17:12:23
0: cs:Connected st:Secondary/Primary ld:Consistent
   ns:0 nr:9820 dw:9820 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
1: cs:WFConnection st:Primary/Unknown ld:Consistent
   ns:32900 nr:844 dw:33744 dr:259281 al:0 bm:5 lo:0 pe:0 ua:0 ap:0
2: cs:Connected st:Secondary/Primary ld:Consistent
   ns:0 nr:18973748 dw:18973748 dr:0 al:0 bm:94 lo:0 pe:0 ua:0 ap:0
3: cs:WFConnection st:Primary/Unknown ld:Consistent
   ns:3721668 nr:1880 dw:3725040 dr:390077 al:126 bm:0 lo:0 pe:0 ua:0 ap:0
4: cs:Connected st:Primary/Secondary ld:Consistent
   ns:6 nr:3 dw:9 dr:751 al:0 bm:0 lo:0 pe:0 ua:0 ap:0

---------------------

Config and logs from both machines are attached


-- 
Lukasz Engel

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: drbd.conf
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070424/2942b630/attachment.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gauss1.log.gz
Type: application/gzip
Size: 3027 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070424/2942b630/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gauss2.log.gz
Type: application/gzip
Size: 2645 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070424/2942b630/attachment-0001.bin>


More information about the drbd-user mailing list