[DRBD-user] potential bug if reconstruct drbd with degraded size.

Thu Oct 20 14:03:11 CEST 2005

Yes of course,Its my jobs, I'll reproduce this one tommorrow morning.
because its already late night here in Taipei.but thanks
anyway for the quick reply. Attached below are the
modules that run on our box.

BTW, primary node is using 160MB scsi
          remote is using 320MB scsi

Module                  Size  Used by
drbd                  149616  0
bonding                68456  0
i2c_i801                8844  2
i2c_dev                 8832  4
i2c_core               18304  2 i2c_i801,i2c_dev
mptspi                  7048  0
mptscsih               32596  1 mptspi
mptbase                41568  2 mptspi,mptscsih
aic7xxx               162356  0
e1000                 104500  0
e100                   35968  0
sym53c8xx          80916  0

> drbd0: drbdsetup [3276]: cstate StandAlone --> StandAlone
> drbd0: drbdsetup [3276]: cstate StandAlone --> Unconfigured
> drbd0: worker terminated

and here you reconfigure with a smaller lower level storage, right?
Yes,

----- Original Message ----- 
From: "Lars Ellenberg" <Lars.Ellenberg at linbit.com>
To: <drbd-user at lists.linbit.com>
Sent: Thursday, October 20, 2005 7:37 PM
Subject: Re: [DRBD-user] potential bug if reconstruct drbd with degraded 
size.

/ 2005-10-20 16:02:34 +0800
\ Francis I. Malolot:
> Hi All,
>
> A potential bug if  reconstruct drbd with degraded size.
> I had created drbd with a size of 2.0Tb both side on top of
> lvm and xfs with external meta disk while on syn_ed  I had stop
> drbd on both sides.Then remove and recreate volumes(1.5Tb) with degraded
> sizes on both sides again, a potential bug occur.
>
> BTW drbd is 0.7.13, kernel 2.6.13

> tagged command queuing enabled, command queue depth 16.

interessting...
what driver is this?
I'd like to test some improvements we have in drbd 0.8,
that use TCQ ...

>  target1:0:0: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 62)
> Attached scsi disk sda at scsi1, channel 0, id 0, lun 0
> Attached scsi generic sg0 at scsi1, channel 0, id 0, lun 0,  type 0
>   Vendor: SN-3143P  Model:                   Rev: 0001
>   Type:   Direct-Access                      ANSI SCSI revision: 03

...

> drbd0: Resync started as SyncSource (need to sync 2048000000 KB [512000000
> bits set]).
> XFS mounting filesystem drbd0
> Ending clean XFS mount for filesystem: drbd0
> drbd0: Primary/Secondary --> Secondary/Secondary
> drbd0: drbdsetup [3276]: cstate SyncSource --> Unconnected
> drbd0: drbd0_receiver [3223]: cstate Unconnected --> BrokenPipe
> drbd0: short read expecting header on sock: r=-512
> drbd0: worker terminated
> drbd0: asender terminated
> drbd0: drbd0_receiver [3223]: cstate BrokenPipe --> StandAlone
> drbd0: Connection lost.
> drbd0: receiver terminated
> drbd0: drbdsetup [3276]: cstate StandAlone --> StandAlone
> drbd0: drbdsetup [3276]: cstate StandAlone --> Unconfigured
> drbd0: worker terminated

and here you reconfigure with a smaller lower level storage, right?

> drbd0: resync bitmap: bits=384008192 words=12000256
> drbd0: size = 1464 GB (1536032768 KB)
> drbd0: 1536032768 KB now marked out-of-sync by on disk bit-map.
> drbd0: 1464 GB marked out-of-sync by on disk bit-map.
> Unable to handle kernel paging request at virtual address 00200200
>  printing eip:
> f09ff4d7
> *pde = 00000000
> Oops: 0002 [#1]
> SMP
> Modules linked in: drbd bonding i2c_i801 i2c_dev i2c_core mptspi mptscsih
> mptbase aic7xxx e1000 e100 sym53c8xx
> CPU:    0
> EIP:    0060:[<f09ff4d7>]    Not tainted VLI
> EFLAGS: 00010006   (2.6.13)
> EIP is at lc_set+0x47/0xc0 [drbd]
> eax: f0a2bc04   ebx: 0003d090   ecx: 00200200   edx: 00100100
                                       ^^^^^^^^        ^^^^^^^^
this is list poison.
so something tries to manipulate a list entry that was poisoned.

> esi: f0a2bc34   edi: f0a2a000   ebp: 00000000   esp: e60afd94
> ds: 007b   es: 007b   ss: 0068
> Process drbdsetup (pid: 3293, threadinfo=e60ae000 task=e9909530)
> Stack: c02ecdd8 00000000 00000000 ea3b4000 e9c40000 f09fd636 f0a2a000 
> 0003d090
>        00000100 f0a04820 e60afdc8 e9c40508 00000005 00000000 00000000 
> 00000001
>        00000000 00000001 00000000 00000001 f0a04b00 e9c40000 000001e7
> f09efc55
> Call Trace:
>  [<c02ecdd8>] sprintf+0x28/0x30
>  [<f09fd636>] drbd_al_read_log+0x246/0x290 [drbd]
>  [<f09efc55>] drbd_ioctl_set_disk+0x485/0x800 [drbd]
>  [<c0178155>] dput+0x175/0x1f0
>  [<f09f16a9>] drbd_ioctl+0x879/0xcb3 [drbd]
>  [<c02e993a>] kobject_get+0x1a/0x30
>  [<c0348884>] get_disk+0x44/0xa0
>  [<c03479af>] exact_lock+0xf/0x20
>  [<c02edcc0>] __copy_to_user_ll+0x70/0x80
>  [<c02edd92>] copy_to_user+0x42/0x60
>  [<c0169c98>] cp_new_stat64+0xf8/0x110
>  [<c0347342>] blkdev_driver_ioctl+0x52/0x90
>  [<c0347424>] blkdev_ioctl+0xa4/0x1b0
>  [<c016861b>] block_ioctl+0x2b/0x30
>  [<c0172e1e>] do_ioctl+0x8e/0xa0
>  [<c0173005>] vfs_ioctl+0x65/0x1f0
>  [<c01731d5>] sys_ioctl+0x45/0x70
>  [<c0102f9f>] sysenter_past_esp+0x54/0x75
> Code: 0c 0f 88 83 00 00 00 8b 47 1c 39 c2 73 7c 8b 4f 18 8d 04 87 0f af d1
> 01 d0 8d 70 30 8b 4e 04 89 5e 14 85 c9 74 1a 8b 50 30 85 d2 <89> 11 74 03 
> 89
> 4a 04 c7 40 30 00 00 00 00 c7 46 04 00 00 00 00

if you can reproduce this easily, that could help to investigate the issue.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user