Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi! I have a cluster with Sles11 HAE SP1 x86_64 running drbd and pacemaker. Replication link is two direct 1GBit links bonded with bonding mode balance-rr The system is in a very interesting state, that will lead straight into desaster if a failover should occur. The master Node has the secondary as unknown with blocks out of sync, while the secondary thinks everything is OK. if a failover occurs, it happily comes up with stale data. I have no idea what could cause this. Novell support is already involved, but they have not yet come back with a solution since last wednesday. Any Ideas Node1:~ # cat /proc/drbd version: 8.3.10 (api:88/proto:86-96) GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by phil at fat-tyre, 2011-01-28 12:17:35 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:2015223 nr:32 dw:2015228 dr:896547611 al:40 bm:12 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:2022882 nr:4281 dw:2027129 dr:1413273405 al:38 bm:20 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0 2: cs:Connected ro:Primary/Secondary ds:UpToDate/DUnknown C r----- ns:12558893 nr:63340 dw:28009675 dr:1503787133 al:3204 bm:1060 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:1285512 Node1:~ # ssh node2 cat /proc/drbd version: 8.3.10 (api:88/proto:86-96) GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by phil at fat-tyre, 2011-01-28 12:17:35 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- ns:0 nr:2015223 dw:2015223 dr:143334132 al:0 bm:12 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- ns:0 nr:2022882 dw:2022882 dr:143334132 al:0 bm:16 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0 2: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- ns:0 nr:12558753 dw:12558753 dr:716687192 al:0 bm:1045 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0 node1:~ # corosync-cfgtool -s Printing ring status. Local node ID 16885952 RING ID 0 id = 192.168.1.1 status = ring 0 active with no faults RING ID 1 id = 10.73.50.11 status = ring 1 active with no faults Mit freundlichen Grüßen / Best Regards Robert Köppl Systemadministration KNAPP Systemintegration GmbH Waltenbachstraße 9 8700 Leoben, Austria Phone: +43 3842 805-910 Fax: +43 3842 82930-500 robert.koeppl at knapp.com www.KNAPP.com Commercial register number: FN 138870x Commercial register court: Leoben The information in this e-mail (including any attachment) is confidential and intended to be for the use of the addressee(s) only. If you have received the e-mail by mistake, any disclosure, copy, distribution or use of the contents of the e-mail is prohibited, and you must delete the e-mail from your system. As e-mail can be changed electronically KNAPP assumes no responsibility for any alteration to this e-mail or its attachments. KNAPP has taken every reasonable precaution to ensure that any attachment to this e-mail has been swept for virus. However, KNAPP does not accept any liability for damage sustained as a result of such attachment being virus infected and strongly recommend that you carry out your own virus check before opening any attachment. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110808/d9a81b23/attachment.htm>