[DRBD-user] Default Split Brain Behaviour

Felix Frank ff at mpexnet.de
Thu Jan 27 09:10:59 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> date stamps on the notify-* scripts are all uniform (predating the system build) & I don't recall modifying them at all.


> From the logs, I'm curious about the lines...
> Jan 23 15:07:16 emlsurit-v4 kernel: [   15.044910] block drbd9: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> Jan 23 15:07:16 emlsurit-v4 kernel: [   15.044929] block drbd9: Marked additional 508 MB as out-of-sync based on AL.
> ...then a little further down
> Jan 23 15:53:01 emlsurit-v4 kernel: [ 2756.121108] block drbd9: role( Secondary -> Primary )

A little further down as in "45 minutes later" ;-)

> On both nodes, I also noticed ... block drbd9: helper command: /sbin/drbdadm split-brain minor-9 exit code 127 (0x7f00)

Not sure about this one, but if you *had* split brain, there would have
been no sync-back (or any standing DRBD connection).

> Somehow the (508 Mb?) data has rolled back, & while I'm sad I've likely lost the data, I can't afford to release this system to production until I'm content it won't happen again.

The thing about the AL (activity log) is: It is only active when your
nodes are in sync. Your primary will only sync back hot extents from its
peer if the peer's data is known to be up to date! It should not be
possible to loose any data because of sync-back of hot AL extents.

To make this more clear: Whenever your nodes are in sync, DRBD keeps
track of the last ~500 MB (in your case, it depends on the al_extents
setting) that were written. This information is stored permanently in
the metadata of the Primary. When it goes down and comes up again, it
marks those 500MB as "out of date". This is helpful: When coming up
after a hard crash with possible data loss, the Primary can restore any
lost writes from the Secondary. Note that this will not destroy data:
The Secondary will become SyncSource only if it's UpToDate.

> The userland tools are ...
> drbdadm --version
> DRBDADM_BUILDTAG=GIT-hash:\ ea9e28dbff98e331a62bcbcc63a6135808fe2917\ build\ by\ buildd at yellow\,\ 2010-06-01\ 11:06:12
> Any assistance to help me dig a little deeper here, will be greatly appreciated.

Are you absolutely certain that you lost data? Because from here, it
sure doesn't look like it. (As a matter of fact, it doesn't even look
like split brain - has any of your notify scripts fired?)


More information about the drbd-user mailing list