[DRBD-user] oracle on drbd failed

Sun Aug 26 05:10:44 CEST 2012

Hi All:

I built a cluster to protect oracle database. The oracle db file
stored on the drbd(8.3.13)  device using protocol A.  But sometime
oracle can not be failover  when the primary node is down. Here is the
testing step

1. node A, B, A is primary node, B is secondary node. oracle run on
node A  and excute a SQL to insert lots of data to oracle .
2. on node B, do the following loop to simulate the situation  that
node A failed

while [ 0 ] ; do

#broken net link   by iptables

#disconnect drbd0 and let it be primary

drbdadm disconnect drbd0
drbdadm primary drbd0

#mount and start oracle
....

#if start failed , break
...

#stop oracle & umount drbd0

#reconnect net link

drbdadm connect drbd0
drbdadm -- --discard-my-data connect drbd0

sleep 5
done

After several loops, oracle can not be started   and the following
error occur in alter_<SID>.log

ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr]

or

ORA-00353: log corruption near block 68622 change 39685781 time
08/25/2012 16:06:42

In oracle's metalink , the first error means that there was a power
failure causing logical corruption in controlfile.  The second error
means that there was a corruption in redo log file

How can I avoid there errors and let oracle be failover at any time
the primary node crash?  Thanks.

BTW: protocol A is needed because the cluster running  WAN  and using a proxy.