[DRBD-user] Re: drbd 0.6.12 and 0.6.7: Epoch set size wrong!! tl messed up! transferlog too small!!

Sun Nov 25 04:14:36 CET 2007

Dear Lars,

Thank you very much for your kind and helpful reply.

On 24/11/07 19:23 +0100, Lars Ellenberg wrote:
>On Fri, Nov 23, 2007 at 12:51:48AM +1100, Nick Urbanik wrote:
>> Dear Folks,
>> 
>> I am using DRBD 0.6.12 and have created a second drbd device on two
>> 200MB raw disks.

Sorry, that's 200GB.

>> I am syncing data, and am copying data, and all was
>> going nicely until these messages began to appear in
>> /var/log/messages:
>> Nov 23 00:23:51 machine1 kernel: drbd1: tl messed up!
>> Nov 23 00:23:51 machine1 kernel: drbd1: Epoch set size wrong!!found=192 
>> reported=191 Nov 23 00:23:51 machine1 kernel: drbd1: transferlog too 
>> small!! 
>> What does this mean?
>
>it means that the transfer log in 0.6 is too small, and got messed
>up.  you don't want more details.
>
>to put it an other way, it means you should upgrade your drbd.

Please note this line from orignal post:
>> (Please don't reply saying "|_|pgr4|)3 d00dz!" :-)

>> Will I lose data?
>
>not immediately.
>this does not mean that it would be harmless.
>only that as long as nothing else fails,
>this alone will not eat your data.
>howere, in combination with real failures, it very well might.
>
>> Is something configured wrongly?
>probably.
>you could try to increase tl-size.

Okay, I will.

>> Note: heartbeat is monitoring the first device, but not the second,
>> on which these errors are being reported.
>
>heartbeat is monitoring?  are you sure?

Without a doubt.
>you mean you use heartbeat 2, with drbd 0.6,
>on a red hat 7.3?
No,
heartbeat-0.4.9.2

>why would you do that?

To get failover to the other PostgreSQL box.

>and you only mirror 200 MB?
>what is this, some sort of embeded thing?

Sorry, typo, it's 200GB.

>> $ cat /etc/drbd.conf
>> global {
>>      minor_count=2
>> }
>> 
>> resource drbd0 {
>>     protocol = B
>
>note that the implementation of protocol A and B
>has been wrong, always, from the begining,
>and only got fixed in drbd 0.7.22 and later,
>as well as 8.x. using protocol != C with older versions
>may well lead to data loss if you had some failovers.
>also, with older versions, protocol C performs better
>(even though that is counter intutive, and was documented
>otherwise back then, iirc)

Thank you for that.  I'll switch all our legacy drbd systems to
protocol C.  Does this refer only to the network, or is the data on
the drbd device itself required to be different?  I mean, can I just
change the protocal and restart, or do I need to do much more?

>I'd recommend to upgrade to drbd 8,
>and probably upgrade the rest of the system as well.
>
>unless you are locked in for some obscure reason.
>though, you should not be, this is the OSS world...

Locked in by lack of time :-)  We automate everything, and testing the
automation, effect of new PostgreSQL, apache, mod_perl, etc., is time
consuming.

Sorry to everyone about sending the post a second time.
-- 
Nick Urbanik http://nicku.org 808-71011 nick.urbanik at optusnet.com.au
GPG: 7FFA CDC7 5A77 0558 DC7A 790A 16DF EC5B BB9D 2C24  ID: BB9D2C24
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20071125/f4ef13bf/attachment.pgp>