Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have 2 machines running drdb 0.7.23 (self compiled) with configured 5 drdbX resources (and heartbeat running above), drbd uses direct cross-over cable for synchronization. Kernel 2.6.19.2 (vendor kernel - trustix 3) UP. Today I disconnected and connected direct cable and after that 2 of 5 drbds was failing to reconnect: drbd0,2,4 successuly connected drbd1 on secondary blocked in NetworkFailure state (WFConnection on primary) drbd3 was retrying to reconnect, but could not succeed (always went to BrokenPipe after WFReportParams) drbdadm down/up for both failed devices helped full scenario: start all drbdN are Connected Primary/Secondary (or Secondary/Primary) 11:20:18 link disconnected 11:21:51 link connected -> drbd0,2,4 reconnected, drbd1,3 didn't 11:24 heartbeat shutdown for gauss1 (secondary for drbd1,3) (I wasn't sure if I had to shutdown whole drbd on the node) 11:28 drbdadm down/up www (drbd1) on gauss1 -> after that drbd1 connected 11:29 drbdadm down/up dbdata (drbd3) on gauss2 -> after that drbd3 connected (I already observed similar problem some time ago, but it is not 100% repeatable, I cannot repeat it second time today) /proc/drbd from both machines (taken before heartbeat shutdown on gauss1): root at gauss1 ~# cat /proc/drbd version: 0.7.23 (api:79/proto:74) SVN Revision: 2686 build by root at gauss1.softax.local, 2007-02-01 00:22:23 0: cs:Connected st:Primary/Secondary ld:Consistent ns:231268 nr:8 dw:231280 dr:3255883 al:1 bm:265 lo:0 pe:0 ua:0 ap:0 1: cs:NetworkFailure st:Secondary/Primary ld:Consistent ns:876 nr:1863628 dw:1864504 dr:1329 al:5 bm:645 lo:0 pe:0 ua:0 ap:0 2: cs:Connected st:Primary/Secondary ld:Consistent ns:86870212 nr:199645112 dw:286551572 dr:1036641651 al:1186444 bm:2615 lo:0 pe:0 ua:0 ap:0 3: cs:BrokenPipe st:Secondary/Unknown ld:Consistent ns:16260 nr:33465888 dw:33482256 dr:80785 al:61 bm:1014 lo:0 pe:0 ua:0 ap:0 4: cs:Connected st:Secondary/Primary ld:Consistent ns:37 nr:1430 dw:1454 dr:1257 al:0 bm:93 lo:0 pe:0 ua:0 ap:0 ------------------- root at gauss2 ~# cat /proc/drbd version: 0.7.23 (api:79/proto:74) SVN Revision: 2686 build by root at gauss2.softax.local, 2007-01-31 17:12:23 0: cs:Connected st:Secondary/Primary ld:Consistent ns:0 nr:9820 dw:9820 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 1: cs:WFConnection st:Primary/Unknown ld:Consistent ns:32900 nr:844 dw:33744 dr:259281 al:0 bm:5 lo:0 pe:0 ua:0 ap:0 2: cs:Connected st:Secondary/Primary ld:Consistent ns:0 nr:18973748 dw:18973748 dr:0 al:0 bm:94 lo:0 pe:0 ua:0 ap:0 3: cs:WFConnection st:Primary/Unknown ld:Consistent ns:3721668 nr:1880 dw:3725040 dr:390077 al:126 bm:0 lo:0 pe:0 ua:0 ap:0 4: cs:Connected st:Primary/Secondary ld:Consistent ns:6 nr:3 dw:9 dr:751 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 --------------------- Config and logs from both machines are attached -- Lukasz Engel -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: drbd.conf URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070424/2942b630/attachment.txt> -------------- next part -------------- A non-text attachment was scrubbed... Name: gauss1.log.gz Type: application/gzip Size: 3027 bytes Desc: not available URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070424/2942b630/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: gauss2.log.gz Type: application/gzip Size: 2645 bytes Desc: not available URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070424/2942b630/attachment-0001.bin>