[DRBD-user] Please help... After reboot I'm always getting unresolved split brain (DRBD+OCFS2)

Felix Frank ff at mpexnet.de
Thu Jan 24 17:04:14 CET 2013


On 01/22/2013 05:04 PM, Jacek Osiecki wrote:
> [41706.085879] block drbd0: PingAck did not arrive in time.
> [41706.085888] block drbd0: peer( Primary -> Unknown ) conn( Connected
> -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
> [41706.086007] block drbd0: new current UUID
> 62770026DDB5FC9D:1AD40906305F01A9:E24FA72FCFB3A8FD:E24EA72FCFB3A8FD

Uhuh. So, your peer just cut the network connection without
deconfiguring DRBD first (i.e., unmount, go Secondary etc.).

This is bad - your local node cannot know what blocks may have been
touched in between connection loss and peer actually unmounting.

I suspect you have to fix this in your shutdown logic.

If that's true though, that's a problem in itself: Dual-primary should
always be run by pacemaker, with working stonith/fencing in place.
Otherwise you set yourself up for a painful actual split-brain including
data loss some day.

Once pacemaker works OK, it will shut down before your network link
does, and will make sure that your DRBD is properly disconnected.

Aside #1 your userland mismatch is rather drastic, you probably want an
8.3 userland.

Aside #2, are you sure you need dual-primary operation? Can vservers
live-migrate already and noone told me? ;-)

HTH,
Felix



More information about the drbd-user mailing list