Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Thu, Jul 05, 2012 at 06:21:12PM +0800, 郭宇 wrote: > We possess two redhat x86_64 virtual servers, as mysql db servers > called 'master' and 'backup', installed drbd8.3.8 and heartbeat > 2.1.3-3, each has two drbd patitions for mysql's log and data. > > Our network status was bad, so heartbeat always changes the two > servers' drbd status between 'primary' and 'secondary'. > > One day, the drbd split brain occured, the two servers treated > themself as the 'primary' at the same time(it's may because of my > wrong heartbeat configure). And then the 'backup' server could not be > connected by all the way(I had made log for the system status > regularly, the log showed normally, so the problem trapped me till now > ...). So you managed to go into "split brain" once, and never resolved that. You had diverging data sets, and no replication. > Then we rebooted the 'backup' server, heartbeat and drbd program > started automatically. > > Then the problem was coming, data on 'master' server which was > 'primary' drbd server was full synced with 'backup' server, I doubt that. Unless you configured very risky auto split brain recovery policies. More like, there was still no replication, and by switching nodes, you simply had your mysql suddenly use a completely different, outdated data set. > and then > the 'master' server's drbd partitions were still mounted, and mysql > program still worked, but the file was completely changed! Lots of my > data on 'master' server was lost!!! > Then, I checked the /proc/drbd, 'master' server's status was > 'Primary/Unknown', 'backup' server's status was > 'Secondary/Unknown'('backup' server's status may be changed by > heartbeat). See. No replication at all. So I guess there was no resync either. You still have two independend versions of your data, probably dating back to where you had that "split brain". > So I think if it's a bug on drbd8.3.8? I don't think so. Rather in your way of using it. You should have fencing policies and fence-peer handlers, and you need to help DRBD to resolve data divergence if it should still occur despite properly configured fencing. To reiterate: You need fencing to help avoid data divergence ("split brain"). You need monitoring (beyond the Pacemaker "monitor" action) to recognize a lost replication link in time. You may need to help DRBD to resolve data divergence, should it still occur. Lars -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com