[DRBD-user] File data corrupt when I switch Nodes

Kevin Izzet Kevin.Izzet at nsc.com
Tue Aug 31 10:34:04 CEST 2004


Hi All,

Firstly can I thank Philipp, Lars and everyone else who has contributed to 
DRBD, 
we have been running a production cluster for over a year now supporting 
:-
Unix/Linux Home accounts
Unix/Linux Application serving
Apache
Samba

My issue is I am in the process of building a second cluster with the 
following spec

HP DL380 G3's
2 Gb RAM
Gigabyte Ethernet for Replication
Serial for Heartbeat
Suse 9.0 prof ed, stripped down for server use only....
Kernel Suse 2.4.21-99-smp4G.
1 x 72GB 15k rpm drive for root
2 x 72GB 15k rpm drive stripped to 135.6GB for drbd replication.
Heartbeat 1.2.2
The server will be running Samba3 and Postgresql 7.4.3 (and 8.0beta for 
testing)

I originally installed drbd 0.6.12 which worked perfectly and has been 
running for a couple
of weeks without any issues, but after the release of 0.7.3 I decided to 
bite the bullet and give
it a go.

The compile and install went fine, built as module not patched into 
Kernel, I re partitioned the 
135GB drive from one to two slices the second slice for the metadata, made 
the partition 500mb
to be on the safe side. 
Originally used Reiserfs for the file system but switched too EXT3 to see 
if this would fix my issue !!!
Created a new drbd.conf file (see below) and started up the primary side 
all fine, rebuilt the samba
Postgresql file structures and restarted the Hearbeat services - All ran 
great.
Next I configured the secondary side and started up drbd, it began to sync 
as expected and 
completed successfully after 2.5 hours I then started up heartbeat on the 
secondary side and 
forced a switch from the primary to the secondary - All appeared to be 
going ok until I
noticed that Postgresql didn't restart on the now new primary.
I checked the logfiles and the startup was complaining that the 
postgresql.conf file was invalid,
on checking the file I found that it appeared to be corrupt, headrer was 
complete gibberish but the
rest of the file contained normal text, I switched the services back over 
and the to my surprise
Postgresql started fine , checked the conf file and it was just a normal 
text file !!!!!!

I have tried rebuilding the drbd devices three times now and also have 
forced a full resync
of the data twice but I still get the same issue.

If I can't get 0.7.3 working in the next month I'm going to have to revert 
back to using 0.6.12
so that I can get this cluster into production....

Any help would be much appreciated - and thanks in advance.........

drbd.conf file looks like this :-

resource drbd0 {
    protocol               C;
    incon-degr-cmd       "echo '!DRBD! pri on incon-degr' | wall ; sleep 
60 ; halt -f";
    on uklnx04 {
        device           /dev/drbd0;
        disk             /dev/cciss/c0d1p1;
        address          192.168.100.31:7788;
        meta-disk        /dev/cciss/c0d1p2 [0];
    }
    on uklnx03 {
        device           /dev/drbd0;
        disk             /dev/cciss/c0d1p1;
        address          192.168.100.30:7788;
        meta-disk        /dev/cciss/c0d1p2 [0];
    }
    net {
        timeout           60;
        connect-int       10;
        ping-int          10;
        max-buffers      2048;
        max-epoch-size   2048;
        ko-count           4;
        on-disconnect    reconnect;
    }
    disk {
        on-io-error      detach;
    }
    syncer {
        rate             50M;
        group              1;
        al-extents       257;
    }
    startup {
        degr-wfc-timeout 120;
    }
}


Regards

Kevin Izzet

Database / Unix / Linux  Administrator
Tel:     (Code)+44(0)1475 655606
Fax:    (Code)+44(0)1475 637755
Email:  Kevin.Izzet at nsc.com


*************************************************************************************
This email may contain confidential and privileged material for the sole 
use of the intended recipient. Any review, use, distribution or disclosure 
by others is prohibited. If you are not the intended or authorised 
recipient please contact the sender by reply email and delete all copies 
of this message
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.linbit.com/pipermail/drbd-user/attachments/20040831/8675f95e/attachment.htm 


More information about the drbd-user mailing list