Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 2004-12-02T16:51:17, Vic Berdin <vic at digi.com.ph> wrote: > I can see that (almost) everything works automatically now. My configured > partition gets mirrored on the secondary as I add/delete files in the > primary node. I can see disk activity on both nodes (via external HDD LED). > And the data does get mirrored on the secondary if do actual test inspection > by mounting the secondary node's partition (ofcourse after shutting down > drbd ;o)). Be careful. That's actually the wrong way to check what's going on on the secondary; if you shutdown drbd before mounting the device, the mount will modify some bits on disk which then won't get replicated and cause the filesystems to diverge, which will cause problems later. Yes, you _can_ mount a filesystem after shutting down drbd, but this is only meant as a last resort if drbd breaks and will later require a manual full replication. The proper way to access a drbd device is to promote the node you want to access it on to 'primary' status and then mount the drbd device. > Now my problem now is this: simulating a machine failure, I deliberately > power down the primary machine. The secondary now inherits the resources > abandoned by the primary: ldirectord loads and `datadisk start` gets > executed by heartbeat - the secondry now becomes the drbd's primary. Correct so far. > Now, upon turning back "on" the primary node, I notice that /proc/drbd > status on both nodes does not seem to detect the existence of the another: > > ------------------ > On the (resource inherited) node: > 0: cs:WFConnection st:Primary/Unknown ns:0 nr:0 dw:12 dr:35 pe:0 ua:0 > > On the newly restarted node: > 0: cs:StandAlone st:Secondary/Unknown ns:0 nr:0 dw:0 dr:0 pe:0 ua:0 > ------------------ Is your network configuration correct? And it could take a couple of seconds for drbd to reconnect, so maybe you should just wait a second. Is drbd itself part of your boot sequence? And you shouldn't have configured 'load-only' or some such. > Connection/mirroring will only resume if I manually do a `drbd reconnect` on > the newly restarted (secondary) node. And this action seem to perform a > complete replication of the primary: This is correct behaviour of drbd in case the primary failed; you should use 0.7.x if you want to get smarter resyncs all the time. 0.6.x can only 'smartly' resync if the secondary fails and the primary stays up. > And oh, btw, I'm actually doing these implementation/tests using a volatile > file system for /var/lib/drbd. Thus, doing a complete reboot on a node > cleans out its /var/lib/drbd when it restarts. I'm guessing there are also > side effects on using a volatile /var/lib/drbd? Yes, the side effect is that you are killing the generation counters and that may cause various bugs and data corruption. /var/lib/drbd may not be volatile; that's a severe setup bug. Sincerely, Lars Marowsky-Brée <lmb at suse.de> -- High Availability & Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business