Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have the same problem on a similar setup, the only difference being that I run NFS on top instead of iscsi. When I take one of the nodes down with a hard reboot I get a split brain that I cannot resolve. If I try to manually load the module on th failed node and do `drbdadm -- --discard-my-data connect all` I get the foolowing in /proc/drbd: version: 8.0pre5 (api:84/proto:83) SVN Revision: 2481M build by root@, 2007-10-16 08:52:04 0: cs:Connected st:Secondary/Primary ds:Diskless/UpToDate r--- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ... which suggest that the system in Diskless. How do I recover from that state without doing full sync and without stopping the working node? On Wednesday 17 October 2007 17:52:34 Rudolph Bott wrote: > Hey list, > > I'm currently trying to setup a drbd/iscsi/heartbeat-environment. This > seems to work out quite well: > > - two boxes with drbd storage > - heartbeat to monitor links and to start iscsi-target etc. > > Upon failure of the primary node, heartbeat starts iscsi-target on the > secondary and makes it the primary node. This works great even while > clients are writing to the filesystem via iscsi - the iscsi initiator > simply reconnects to the cluster-IP and continues writing. > > > But as soon as node1 comes back, both nodes complain about a "split brain" > situation and refuses to resync - although nothing has been written to the > device on node1 since its disconnect! Shouldn't heartbeat handle this > situation? On top of that, I'm not able to resync the devices without > playing around with various drbdadm commands (including taking both sides > down completely - which would not be an acceptable solution in a production > environment) > > Both sides are equally configured debian etch systems, using heartbeat v2 > packages and drbd8-packets from backports.org: version: 8.0.4 > (api:86/proto:86) > SVN Revision: 2947 build by root at nas02, 2007-10-16 13:43:43 > > Please note that the two systems do NOT have equal hardware componentes - > it's just a test environment with different storage capabilities (~230GB > vs. ~60GB) > > below is a log extract and my heartbeat configuration: > > dmesg output from nas02 (after nas01 has been started again): > r8169: eth1: link up > drbd0: conn( WFConnection -> WFReportParams ) > drbd0: Handshake successful: DRBD Network Protocol version 86 > drbd0: Considerable difference in lower level device sizes: 121720712s vs. > 455185536s drbd0: Split-Brain detected, dropping connection! > drbd0: self > C765DF24A2676E31:DDFCF96A854616A7:B34DB3CFA71F57C2:15641BAC6660D448 drbd0: > peer E785C3CFDDDC5C05:DDFCF96A854616A7:B34DB3CFA71F57C2:15641BAC6660D448 > drbd0: conn( WFReportParams -> Disconnecting ) > drbd0: error receiving ReportState, l: 4! > drbd0: meta connection shut down by peer. > drbd0: asender terminated > drbd0: tl_clear() > drbd0: Connection closed > drbd0: conn( Disconnecting -> StandAlone ) > drbd0: receiver terminated > > ha.cf (taken from nas02, besides the ucast-settings thay are identical): > logfacility daemon > keepalive 1 > deadtime 10 > initdead 60 > ucast eth1 172.16.15.1 > ucast eth0 192.168.0.30 > auto_failback off > node nas01 > node nas02 > > haresources: > nas01 192.168.0.34 drbddisk::storage iscsi-target > MailTo::technik at megabit.net > > and here is a quick "drawing" of the whole layout: > > iSCSI initiators > > /--------------Switch------------\ > > 192.168.0.30 192.168.0.31 > nas01 ClusterIP 192.168.0.34 nas02 > 172.16.15.1 172.16.15.2 > > \-----------CrossOver Link--------/ > > Mit freundlichen Grüßen / with kind regards > > Rudolph Bott