[DRBD-user] BAD! BarrierAck

Lars Ellenberg lars.ellenberg at linbit.com
Fri Dec 12 16:22:28 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Fri, Dec 12, 2008 at 09:04:44AM -0600, Nathan Stratton wrote:
> On Fri, 12 Dec 2008, Lars Ellenberg wrote:
>
>> On Thu, Dec 11, 2008 at 08:14:17PM -0600, Nathan Stratton wrote:
>>> On Thu, 11 Dec 2008, Nathan Stratton wrote:
>>>
>>>> Anyidea how to fix this? I keep getting them when trying to sync two
>>>> large systems.
>>>
>>> Running drbd-8.3.0rc2 on Centos 5.2
>>>
>>>> Dec 11 19:59:44 xen1 kernel: drbd0: BAD! BarrierAck #3231051334
>>>> received, expected #3231051333!
>>
>> verry interessting.
>> this is new paranoia code,
>> leading to reconnection.
>> no harm done.
>
> yep, only issue is access to local /dev/drbd0 frezes runing the  
> disconnect/reconnect of the remote notes.
>
>> but,
>> can you give some more details?
>
> For you? Sure!
>
>> how long between two such "BAD!"s, wall clock time and approx. amount of
>> written data?
>
> Looks random, can be 100G or 2G, wall clock looks like:
>
> Dec 11 14:11:02 xen1 kernel: drbd0: BAD! BarrierAck #2399440554 received, expected #2399440553!
> Dec 11 15:06:08 xen1 kernel: drbd0: BAD! BarrierAck #3562915500 received, expected #3562915499!
> Dec 11 15:10:16 xen1 kernel: drbd0: BAD! BarrierAck #2877127253 received, expected #2877127252!
> Dec 11 17:12:49 xen1 kernel: drbd0: BAD! BarrierAck #684515493 received, expected #684515492!
> Dec 11 18:07:11 xen1 kernel: drbd0: BAD! BarrierAck #1304938437 received, expected #1304938436!
> Dec 11 18:40:48 xen1 kernel: drbd0: BAD! BarrierAck #2899175375 received, expected #2899175374!
> Dec 11 18:55:46 xen1 kernel: drbd0: BAD! BarrierAck #229959413 received, expected #229959412!
> Dec 11 19:59:44 xen1 kernel: drbd0: BAD! BarrierAck #3231051334 received, expected #3231051333!
> Dec 11 20:00:17 xen1 kernel: drbd0: BAD! BarrierAck #1512535064 received, expected #1512535063!
>
>
>> what access pattern?
>
> All access right now is on the Primary/UpToDate system.
>
>> only sync?
>
> Unknown since I am not doing much else.
>
>> what is "large"?
>
> /dev/drbd0            9.6T  218G  9.4T   3% /share
>
>> what is your hardware/io subsys/network/drivers?
>
> 3Ware 9650SX with 16 760 gig disks, network is Mellanox MT25204 10 Gb/s  
> with IPoIB since direct infiniband is not yet supported. : )
>
>> can you give me a "dmesg | grep drbd"
>> from module load to first mount of file system?
>
> http://share.robotics.net/drbd0

the same from the other node as well, please.

actually, rather grep the kernel log,
so I see the timestamps as well.

thanks,

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list