Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I use drbd_trace to trace drbd write operation when running oracle, it show info like this; block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_main.c:2152: drbd0_worker [5323] data >>> Barrier (barrier 435610040) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_receiver.c:5005: drbd0_asender [11122] meta <<< BarrierAck (barrier 435610037) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_receiver.c:5005: drbd0_asender [11122] meta <<< BarrierAck (barrier 435610038) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_receiver.c:5005: drbd0_asender [11122] meta <<< BarrierAck (barrier 435610039) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_receiver.c:5005: drbd0_asender [11122] meta <<< BarrierAck (barrier 435610040) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_main.c:3062: drbd0_worker [5323] data >>> Data (sector 1b5000s, offset=36a00000, id ffff880080cadd68, seq 30957, size=1000, f 2) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_main.c:3062: drbd0_worker [5323] data >>> Data (sector 1b5008s, offset=36a01000, id ffff880080cad438, seq 30958, size=1000, f 2) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_main.c:3062: drbd0_worker [5323] data >>> Data (sector 1b5010s, offset=36a02000, id ffff880080cad4a8, seq 30959, size=1000, f 2) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_main.c:3062: drbd0_worker [5323] data >>> Data (sector 1b5018s, offset=36a03000, id ffff880080cadcf8, seq 30960, size=1000, f 2) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_main.c:2152: drbd0_worker [5323] data >>> UnplugRemote (7) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_main.c:2152: drbd0_worker [5323] data >>> Barrier (barrier 435610041) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_main.c:3062: drbd0_worker [5323] data >>> Data (sector 0s, offset=0, id ffff880080cad0b8, seq 30961, size=0, f 2a) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_main.c:3062: drbd0_worker [5323] data >>> Data (sector 0s, offset=0, id ffff880080cad358, seq 30962, size=0, f 2a) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_main.c:3062: drbd0_worker [5323] data >>> Data (sector 0s, offset=0, id ffff880080cad128, seq 30963, size=0, f 2a) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_main.c:2152: drbd0_worker [5323] data >>> Barrier (barrier 435610042) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_main.c:3062: drbd0_worker [5323] data >>> Data (sector 46f80s, offset=8df0000, id ffff880080cad3c8, seq 30964, size=1000, f 2) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_main.c:3062: drbd0_worker [5323] data >>> Data (sector 46f88s, offset=8df1000, id ffff880080cade48, seq 30965, size=1000, f 2) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_main.c:3062: drbd0_worker [5323] data >>> Data (sector 46f90s, offset=8df2000, id ffff880080cad908, seq 30966, size=1000, f 2) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_main.c:3062: drbd0_worker [5323] data >>> Data (sector 46f98s, offset=8df3000, id ffff880080cad898, seq 30967, size=1000, f 2) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_main.c:2152: drbd0_worker [5323] data >>> UnplugRemote (7) block drbd0: /root/rpmbuild/BUILD/drbd-8.3.13/drbd/drbd_main.c:2152: drbd0_worker [5323] data >>> Barrier (barrier 435610043) It's obvious that oracle instance write block at size 4*0x01000=4*4096. Is it possible the fail occured when the secondary node can not recv and write the full 4*4096 block at network failure? If it's true, how to handle this situation? 2012/8/30 Felix Frank <ff at mpexnet.de>: > On 08/30/2012 09:38 AM, Felix Frank wrote: >>> I think you just misunderstood me. The key action for this test is >>> > >>> > drbdadm disconnect >>> > drbdadm primary >>> > >>> > which simulate the situation that the primary is crashed to test if >>> > the oracle can be fail over on secondary node >>> > >>> > drbdadm --discard-my-data connect drbd0 >>> > >>> > the action just keep the secondary's data sync with the primary data >>> > for the next test. >> ...assuming the primary had not accumulated some minor corruptions >> during an earlier loop iteration. > > Which reminds me: After failing a protocol A resource, it's important to > perform a verify. > > Oracle *will* clean up any mess on the new primary, but without a full > sync back, you cannot be entirely sure that the old primary does not > retain any old writes that hadn't made it to the new primary. The > activity log is supposed to protect you from this, but I disbelieve it > can keep you 100% safe. > > Cheers, > Felix