[DRBD-user] oracle on drbd failed

Mia Lueng xiaozunvlg at gmail.com
Sun Aug 26 19:24:08 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I think the replicate mechanism  of protocol A is like Veritas VVR's
SRL async mode.  Are there any difference between them? I do not hear
any oracle failed issume abort VVR in our customers.

2012/8/26 Felix Egli <mail at felix-egli.ch>:
> Hi
>
> The problem you have, is what can be expected with protocol A. Oracle
> expects that data is commited to the disk, which is in the RAM of your
> node A. I really have no idea in which situation protocol A can be
> useful.
>
> Cheers, Felix
>
> Am 2012-08-26 05:10, schrieb Mia Lueng:
>>
>> Hi All:
>>
>> I built a cluster to protect oracle database. The oracle db file
>> stored on the drbd(8.3.13)  device using protocol A.  But sometime
>> oracle can not be failover  when the primary node is down. Here is the
>> testing step
>>
>> 1. node A, B, A is primary node, B is secondary node. oracle run on
>> node A  and excute a SQL to insert lots of data to oracle .
>> 2. on node B, do the following loop to simulate the situation  that
>> node A failed
>>
>>
>> while [ 0 ] ; do
>>
>> #broken net link   by iptables
>>
>> #disconnect drbd0 and let it be primary
>>
>> drbdadm disconnect drbd0
>> drbdadm primary drbd0
>>
>> #mount and start oracle
>> ....
>>
>> #if start failed , break
>> ...
>>
>> #stop oracle & umount drbd0
>>
>>
>> #reconnect net link
>>
>> drbdadm connect drbd0
>> drbdadm -- --discard-my-data connect drbd0
>>
>> sleep 5
>> done
>>
>>
>> After several loops, oracle can not be started   and the following
>> error occur in alter_<SID>.log
>>
>>
>>
>> ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr]
>>
>> or
>>
>> ORA-00353: log corruption near block 68622 change 39685781 time
>> 08/25/2012 16:06:42
>>
>>
>> In oracle's metalink , the first error means that there was a power
>> failure causing logical corruption in controlfile.  The second error
>> means that there was a corruption in redo log file
>>
>> How can I avoid there errors and let oracle be failover at any time
>> the primary node crash?  Thanks.
>>
>> BTW: protocol A is needed because the cluster running  WAN  and using
>> a proxy.
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
> --
> Felix Egli



More information about the drbd-user mailing list