Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, > date stamps on the notify-* scripts are all uniform (predating the system build) & I don't recall modifying them at all. good. > From the logs, I'm curious about the lines... > Jan 23 15:07:16 emlsurit-v4 kernel: [ 15.044910] block drbd9: 0 KB (0 bits) marked out-of-sync by on disk bit-map. > Jan 23 15:07:16 emlsurit-v4 kernel: [ 15.044929] block drbd9: Marked additional 508 MB as out-of-sync based on AL. > > ...then a little further down > Jan 23 15:53:01 emlsurit-v4 kernel: [ 2756.121108] block drbd9: role( Secondary -> Primary ) A little further down as in "45 minutes later" ;-) > On both nodes, I also noticed ... block drbd9: helper command: /sbin/drbdadm split-brain minor-9 exit code 127 (0x7f00) Not sure about this one, but if you *had* split brain, there would have been no sync-back (or any standing DRBD connection). > Somehow the (508 Mb?) data has rolled back, & while I'm sad I've likely lost the data, I can't afford to release this system to production until I'm content it won't happen again. The thing about the AL (activity log) is: It is only active when your nodes are in sync. Your primary will only sync back hot extents from its peer if the peer's data is known to be up to date! It should not be possible to loose any data because of sync-back of hot AL extents. To make this more clear: Whenever your nodes are in sync, DRBD keeps track of the last ~500 MB (in your case, it depends on the al_extents setting) that were written. This information is stored permanently in the metadata of the Primary. When it goes down and comes up again, it marks those 500MB as "out of date". This is helpful: When coming up after a hard crash with possible data loss, the Primary can restore any lost writes from the Secondary. Note that this will not destroy data: The Secondary will become SyncSource only if it's UpToDate. > The userland tools are ... > > drbdadm --version > DRBDADM_BUILDTAG=GIT-hash:\ ea9e28dbff98e331a62bcbcc63a6135808fe2917\ build\ by\ buildd at yellow\,\ 2010-06-01\ 11:06:12 > DRBDADM_API_VERSION=88 > DRBD_KERNEL_VERSION_CODE=0x080307 > DRBDADM_VERSION_CODE=0x080307 > DRBDADM_VERSION=8.3.7 > > Any assistance to help me dig a little deeper here, will be greatly appreciated. Are you absolutely certain that you lost data? Because from here, it sure doesn't look like it. (As a matter of fact, it doesn't even look like split brain - has any of your notify scripts fired?) Regards, Felix