[DRBD-user] Re: possible bug in drbd

Philipp Reisner philipp.reisner at linbit.com
Fri Feb 3 16:49:25 CET 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> Hello Philip
> here's the situation:
>
> I had a two node cluster done with drbd and heartbeat. the /dev/drbd0
> partition size was 60GB. Then my secondary node failed (hard disk burn), so
> I reinstalled on a new disk but this disk was much bigger ( 160G) so I
> decided to give to drbd just 60GB and I created a partition . Silly of me
> this partition was 2MB smaller than the /dev/drbd0 partition on primary (
> due to rounding problems ).
>
> When the secondary node went up, the primary node said "the smaller
> partition wins" so it MARKED THE DEVICE SIZE AS SMALLER instead of
> complaining to the partner that it could not became a secondary node of a
> bigger primary. He is the primary and has the consistent data, in no case
> marking a device as smaller could lead to a usable partition!!
>
> Infact the primary node said:
>
> an 19 18:49:43 gezz kernel: drbd0: drbd0_receiver [9880]: cstate
> WFConnection --
>
> > WFReportParams
>
> Jan 19 18:49:43 gezz kernel: drbd0: Handshake successful: DRBD Network
> Protocol version 74
> Jan 19 18:49:43 gezz kernel: drbd0: resync bitmap: bits=15594460
> words=487328 Jan 19 18:49:43 gezz kernel: drbd0: size = 59 GB (62377840 KB)
> Jan 19 18:50:01 gezz kernel: attempt to access beyond end of device
> Jan 19 18:50:01 gezz kernel: drbd0: rw=0, want=125419536, limit=124755680
> Jan 19 18:50:01 gezz kernel: EXT3-fs error (device drbd0): ext3_find_entry:
> reading directory #7831554 offset 0
> Jan 19 18:50:01 gezz kernel:
> Jan 19 18:50:01 gezz kernel: Aborting journal on device drbd0.
> Jan 19 18:50:25 gezz kernel: ext3_abort called.
> Jan 19 18:50:25 gezz kernel: EXT3-fs error (device drbd0):
> ext3_journal_start_sb: Detected aborted journal
> Jan 19 18:50:25 gezz kernel: Remounting filesystem read-only
>
> I.E. at 18:49:43 the primary node detected that a partner come up and
> resized the drbd0 device. at 18:50:01 there has been the detection of
> inconsistences in filesystem that has been marked readonly and lead to
> unusable state.
>
> I think that a primary/consistent node that "sees" a secondary/inconsistent
> partner with different device size, should not resize its partition but
> refuse the partner.
>
> By the way once that partition has been resized, it has not been possible
> to resize it to original state in standalone mode. I had to shutdown
> secondary node, delete the smaller partition , create a bigger partition,
> restart drbd , and then I could resize the primary partion to its original
> size.
>

Hi,

Often it is really unbelievable what horribly bugs are still in DRBD.

Apparently nobody ever before run into this issue..

The fix is attached (against 0.7.15). Success reports are welcome, 
although I do not expect that there are many out there who voluntarily 
test this ;)

-Philipp
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :
-------------- next part --------------
A non-text attachment was scrubbed...
Name: do_not_shrink_consistent.diff
Type: text/x-diff
Size: 6896 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20060203/bab3da04/attachment.diff>


More information about the drbd-user mailing list