Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Fri, 12 Dec 2008, Lars Ellenberg wrote: > On Fri, Dec 12, 2008 at 09:04:44AM -0600, Nathan Stratton wrote: >> On Fri, 12 Dec 2008, Lars Ellenberg wrote: >> >>> On Thu, Dec 11, 2008 at 08:14:17PM -0600, Nathan Stratton wrote: >>>> On Thu, 11 Dec 2008, Nathan Stratton wrote: >>>> >>>>> Anyidea how to fix this? I keep getting them when trying to sync two >>>>> large systems. >>>> >>>> Running drbd-8.3.0rc2 on Centos 5.2 >>>> >>>>> Dec 11 19:59:44 xen1 kernel: drbd0: BAD! BarrierAck #3231051334 >>>>> received, expected #3231051333! >>> >>> verry interessting. >>> this is new paranoia code, >>> leading to reconnection. >>> no harm done. >> >> yep, only issue is access to local /dev/drbd0 frezes runing the >> disconnect/reconnect of the remote notes. >> >>> but, >>> can you give some more details? >> >> For you? Sure! >> >>> how long between two such "BAD!"s, wall clock time and approx. amount of >>> written data? >> >> Looks random, can be 100G or 2G, wall clock looks like: >> >> Dec 11 14:11:02 xen1 kernel: drbd0: BAD! BarrierAck #2399440554 received, expected #2399440553! >> Dec 11 15:06:08 xen1 kernel: drbd0: BAD! BarrierAck #3562915500 received, expected #3562915499! >> Dec 11 15:10:16 xen1 kernel: drbd0: BAD! BarrierAck #2877127253 received, expected #2877127252! >> Dec 11 17:12:49 xen1 kernel: drbd0: BAD! BarrierAck #684515493 received, expected #684515492! >> Dec 11 18:07:11 xen1 kernel: drbd0: BAD! BarrierAck #1304938437 received, expected #1304938436! >> Dec 11 18:40:48 xen1 kernel: drbd0: BAD! BarrierAck #2899175375 received, expected #2899175374! >> Dec 11 18:55:46 xen1 kernel: drbd0: BAD! BarrierAck #229959413 received, expected #229959412! >> Dec 11 19:59:44 xen1 kernel: drbd0: BAD! BarrierAck #3231051334 received, expected #3231051333! >> Dec 11 20:00:17 xen1 kernel: drbd0: BAD! BarrierAck #1512535064 received, expected #1512535063! >> >> >>> what access pattern? >> >> All access right now is on the Primary/UpToDate system. >> >>> only sync? >> >> Unknown since I am not doing much else. >> >>> what is "large"? >> >> /dev/drbd0 9.6T 218G 9.4T 3% /share >> >>> what is your hardware/io subsys/network/drivers? >> >> 3Ware 9650SX with 16 760 gig disks, network is Mellanox MT25204 10 Gb/s >> with IPoIB since direct infiniband is not yet supported. : ) >> >>> can you give me a "dmesg | grep drbd" >>> from module load to first mount of file system? >> >> http://share.robotics.net/drbd0 > > the same from the other node as well, please. > > actually, rather grep the kernel log, > so I see the timestamps as well. http://share.robotics.net/drbd0-SyncSource http://share.robotics.net/drbd0-SyncTarget ><> Nathan Stratton CTO, BlinkMind, Inc. nathan at robotics.net nathan at blinkmind.com http://www.robotics.net http://www.blinkmind.com