Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I run a Linux Fileserver on Debian 5.0 Lenny. There are a RAID-1 device for the system-partitions an a RAID-5 device for the data-partitions. DRBD attaches on the RAID-5 device and offers /dev/drbd0 as physical device for the LVM. LVM provides a couple (about 25) of logical volumes formatted with XFS filesystem. Everything went fine during testing for about 2 weeks but a few hours after integrating the server to the productive area, the Filesystem on the first logical volume crashes. After a complete system-reboot, LVM can´t find ANY of the physical volumes (drbd0) (tested with pvscan) The kernel says: Jul 19 13:36:03 fileserver kernel: [115605.913825] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Jul 19 13:36:03 fileserver kernel: [115605.913825] Filesystem "dm-0": XFS internal error xfs_da_do_buf(2) at line 2085 of file fs/xfs/xfs_da_btree.c. Calle r 0xfc9fcc69 Jul 19 13:36:03 fileserver kernel: [115605.913825] Pid: 10423, comm: smbd Not tainted 2.6.26-2-686 #1 Jul 19 13:36:03 fileserver kernel: [115605.913825] [<fc9fcb3e>] xfs_da_do_buf+0x5ef/0x6bd [xfs] Jul 19 13:36:03 fileserver kernel: [115605.913825] [<fc9fcc69>] xfs_da_read_buf+0x19/0x1e [xfs] Jul 19 13:36:03 fileserver kernel: [115605.913825] [<fc9fcc69>] xfs_da_read_buf+0x19/0x1e [xfs] Jul 19 13:36:03 fileserver kernel: [115605.913825] [<f89c176b>] do_get_write_access+0x2f8/0x331 [jbd] Jul 19 13:36:03 fileserver kernel: [115605.913825] [<fc9fcc69>] xfs_da_read_buf+0x19/0x1e [xfs] Jul 19 13:36:03 fileserver kernel: [115605.913825] [<fc9ff20a>] xfs_dir2_block_lookup_int+0x3d/0x177 [xfs] Jul 19 13:36:03 fileserver kernel: [115605.913825] [<fc9ff20a>] xfs_dir2_block_lookup_int+0x3d/0x177 [xfs] Jul 19 13:36:03 fileserver kernel: [115605.913825] [<c0129b12>] lock_timer_base+0x19/0x35 Jul 19 13:36:03 fileserver kernel: [115605.913825] [<fc9ffa8e>] xfs_dir2_block_lookup+0x16/0x96 [xfs] Jul 19 13:36:03 fileserver kernel: [115605.913825] [<fc9fe6c5>] xfs_dir2_isblock+0x14/0x58 [xfs] Jul 19 13:36:03 fileserver kernel: [115605.913825] [<fc9ff02d>] xfs_dir_lookup+0x98/0xe7 [xfs] Jul 19 13:36:03 fileserver kernel: [115605.913825] [<c02865ed>] __tcp_push_pending_frames+0x619/0x6b2 Jul 19 13:36:03 fileserver kernel: [115605.913825] [<fca24a6c>] xfs_lookup+0x40/0x8a [xfs] Jul 19 13:36:03 fileserver kernel: [115605.913825] [<fca2f2b6>] xfs_vn_lookup+0x35/0x6b [xfs] Jul 19 13:36:03 fileserver kernel: [115605.913825] [<c017a607>] do_lookup+0xb6/0x153 Jul 19 13:36:03 fileserver kernel: [115605.913825] [<c017c1fb>] __link_path_walk+0x726/0xb0d Jul 19 13:36:03 fileserver kernel: [115605.913825] [<c015896e>] filemap_fault+0x1f5/0x35b Jul 19 13:36:03 fileserver kernel: [115605.913825] [<c017c619>] path_walk+0x37/0x70 Jul 19 13:36:03 fileserver kernel: [115605.913825] [<c017c8c8>] do_path_lookup+0x122/0x184 Jul 19 13:36:03 fileserver kernel: [115605.913825] [<c017d125>] __user_walk_fd+0x29/0x3a Jul 19 13:36:03 fileserver kernel: [115605.913825] [<c01772c6>] vfs_stat_fd+0x15/0x3c Jul 19 13:36:03 fileserver kernel: [115605.913825] [<c01773a2>] sys_stat64+0xf/0x23 Jul 19 13:36:03 fileserver kernel: [115605.913825] [<c0115b67>] do_page_fault+0x29b/0x5b8 Jul 19 13:36:03 fileserver kernel: [115605.913825] [<c0174ea3>] sys_read+0x3c/0x63 Jul 19 13:36:03 fileserver kernel: [115605.913825] [<c01158cc>] do_page_fault+0x0/0x5b8 Jul 19 13:36:03 fileserver kernel: [115605.913825] [<c0103853>] sysenter_past_esp+0x78/0xb1 Jul 19 13:36:03 fileserver kernel: [115605.913825] [<c02b0000>] aer_probe+0x7d/0xff Jul 19 13:36:03 fileserver kernel: [115605.913825] ======================= This log-"block" repeats nearly every second until the complete system-reboot. Currently no data-partition is available. Does anyone know the reason or better have an solution for this problem? Doesn´t XFS work with DRBD using protocol A? Does XFS shoot up the hole LVM AND DRBD-Device? Is a kernel-bug possibly the reason? I´m on the brink to reconfigure DRBD & LVM, but scared of having such a crash again... I would be happy to get an answer. Thanks a lot! Here my drbd.conf: global { usage-count no; } resource r0 { protocol A; startup { wfc-timeout 120; ## 2 minutes. degr-wfc-timeout 120; ## 2 minutes. } disk { on-io-error detach; } net { max-buffers 8192; unplug-watermark 8192; } syncer { rate 100M; al-extents 3833; } on fileserver { device /dev/drbd0; disk /dev/sdb; address 10.0.0.11:7788; meta-disk internal; } on backupserver { device /dev/drbd0; disk /dev/sdb; address 10.0.0.12:7788; meta-disk internal; } } -- View this message in context: http://www.nabble.com/Problem-%28Total-Crash%29-with-XFS-on-LVM2-on-DRBD.-Reason-or-Solution--tp24568727p24568727.html Sent from the DRBD - User mailing list archive at Nabble.com.