Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, Today we ran into unrecoverable XFS dinode corruption on a DRBD cluster that's been in production for two years, with 146 day uptime on the machines. This is the first time we've ran into this problem. Both machines run Debian GNU/Linux Sarge with Linux 2.6.12.2 and DRBD 0.7.10. These are HP Proliant DL380 servers with ECC RAM, hardware RAID, and reliable UPS systems. Both DRBD nodes were consistent (and the errors occured regardless of which node we would attempt to mount the filesystem on). During normal operations, the primary machine hit I/O error accessing PostgreSQL database file(s). I am attaching logs of these. We could not mount the filesystem after, because of "corruption of in-memory detected". We were able to restore access to the filesystem by running xfs_repair with the -L option. Unfortunately, we were not able to save a copy of the output of xfs_repair. The result is a filesystem with some corrupt files, which we're beginning to restore from backups. I need help understanding what could have caused this corruption, though, and what we can do to help prevent this from re-occurring. Here are log snapshots of the initial errors (which repeat themselves identically as attempts are made to access the PostgreSQL database): kernel: Filesystem "drbd0": corrupt dinode 203438180, (btree extents). Unmount and run xfs_repair. kernel: Filesystem "drbd0": XFS internal error xfs_bmap_read_extents(1) at line 4374 of file fs/xfs/xfs_bmap.c. Caller 0xc01ea147 kernel: [xfs_bmap_read_extents+993/1232] xfs_bmap_read_extents+0x3e1/0x4d0 kernel: [xfs_iread_extents+119/272] xfs_iread_extents+0x77/0x110 kernel: [xfs_iread_extents+119/272] xfs_iread_extents+0x77/0x110 kernel: [xfs_bmapi+537/5664] xfs_bmapi+0x219/0x1620 kernel: [__elv_add_request+120/192] __elv_add_request+0x78/0xc0 kernel: [__make_request+708/1248] __make_request+0x2c4/0x4e0 kernel: [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 kernel: [generic_make_request+343/496] generic_make_request+0x157/0x1f0 kernel: [as_merged_request+78/496] as_merged_request+0x4e/0x1f0 kernel: [elv_merged_request+37/48] elv_merged_request+0x25/0x30 kernel: [__make_request+708/1248] __make_request+0x2c4/0x4e0 kernel: [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 kernel: [generic_make_request+343/496] generic_make_request+0x157/0x1f0 kernel: [xfs_bmbt_get_state+47/64] xfs_bmbt_get_state+0x2f/0x40 kernel: [find_lock_page+166/192] find_lock_page+0xa6/0xc0 kernel: [xfs_iomap+410/1360] xfs_iomap+0x19a/0x550 kernel: [xfs_buf_get_flags+238/352] xfs_buf_get_flags+0xee/0x160 kernel: [__linvfs_get_block+146/848] __linvfs_get_block+0x92/0x350 kernel: [find_lock_page+166/192] find_lock_page+0xa6/0xc0 kernel: [find_or_create_page+48/208] find_or_create_page+0x30/0xd0 kernel: [_pagebuf_lookup_pages+514/848] _pagebuf_lookup_pages+0x202/0x350 kernel: [linvfs_get_block+60/64] linvfs_get_block+0x3c/0x40 kernel: [do_mpage_readpage+972/992] do_mpage_readpage+0x3cc/0x3e0 kernel: [radix_tree_node_alloc+31/112] radix_tree_node_alloc+0x1f/0x70 kernel: [radix_tree_insert+237/272] radix_tree_insert+0xed/0x110 kernel: [add_to_page_cache+195/208] add_to_page_cache+0xc3/0xd0 kernel: [mpage_readpages+302/352] mpage_readpages+0x12e/0x160 kernel: [linvfs_get_block+0/64] linvfs_get_block+0x0/0x40 kernel: [xfs_iformat+1185/1520] xfs_iformat+0x4a1/0x5f0 kernel: [rmqueue_bulk+53/128] rmqueue_bulk+0x35/0x80 kernel: [prep_new_page+96/112] prep_new_page+0x60/0x70 kernel: [read_pages+306/320] read_pages+0x132/0x140 kernel: [linvfs_get_block+0/64] linvfs_get_block+0x0/0x40 kernel: [__alloc_pages+739/1072] __alloc_pages+0x2e3/0x430 kernel: [__do_page_cache_readahead+253/368] __do_page_cache_readahead+0xfd/0x170 kernel: [blockable_page_cache_readahead+89/224] blockable_page_cache_readahead+0x59/0xe0 kernel: [page_cache_readahead+294/432] page_cache_readahead+0x126/0x1b0 kernel: [do_generic_mapping_read+1576/1600] do_generic_mapping_read+0x628/0x640 kernel: [__generic_file_aio_read+514/576] __generic_file_aio_read+0x202/0x240 kernel: [file_read_actor+0/240] file_read_actor+0x0/0xf0 kernel: [dput+51/464] dput+0x33/0x1d0 kernel: [xfs_read+312/784] xfs_read+0x138/0x310 kernel: [dput+51/464] dput+0x33/0x1d0 kernel: [linvfs_aio_read+142/160] linvfs_aio_read+0x8e/0xa0 kernel: [do_sync_read+183/240] do_sync_read+0xb7/0xf0 kernel: [linvfs_open+138/144] linvfs_open+0x8a/0x90 kernel: [dentry_open+235/512] dentry_open+0xeb/0x200 kernel: [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 kernel: [vfs_read+174/304] vfs_read+0xae/0x130 kernel: [sys_read+81/128] sys_read+0x51/0x80 kernel: [syscall_call+7/11] syscall_call+0x7/0xb Here is a snapshot of the error when attempting to mount the damaged filesystem: kernel: XFS mounting filesystem drbd0 kernel: Starting XFS recovery on filesystem: drbd0 (dev: drbd0) kernel: Filesystem "drbd0": corrupt dinode 203438180, (btree extents). Unmount and run xfs_repair. kernel: Filesystem "drbd0": XFS internal error xfs_bmap_read_extents(1) at line 4374 of file fs/xfs/xfs_bmap.c. Caller 0xc01ea147 kernel: [xfs_bmap_read_extents+993/1232] xfs_bmap_read_extents+0x3e1/0x4d0 kernel: [xfs_iread_extents+119/272] xfs_iread_extents+0x77/0x110 kernel: [generic_make_request+343/496] generic_make_request+0x157/0x1f0 kernel: [xfs_iread_extents+119/272] xfs_iread_extents+0x77/0x110 kernel: [xfs_bunmapi+4065/4224] xfs_bunmapi+0xfe1/0x1080 kernel: [_pagebuf_lookup_pages+514/848] _pagebuf_lookup_pages+0x202/0x350 kernel: [xfs_buf_get_flags+238/352] xfs_buf_get_flags+0xee/0x160 kernel: [xfs_buf_read_flags+58/176] xfs_buf_read_flags+0x3a/0xb0 kernel: [xlog_grant_log_space+959/1008] xlog_grant_log_space+0x3bf/0x3f0 kernel: [cache_alloc_refill+320/560] cache_alloc_refill+0x140/0x230 kernel: [kmem_zone_alloc+76/192] kmem_zone_alloc+0x4c/0xc0 kernel: [xfs_trans_log_inode+45/96] xfs_trans_log_inode+0x2d/0x60 kernel: [xfs_itruncate_finish+481/1072] xfs_itruncate_finish+0x1e1/0x430 kernel: [xfs_inactive+1284/1376] xfs_inactive+0x504/0x560 kernel: [pagebuf_offset+51/64] pagebuf_offset+0x33/0x40 kernel: [xfs_itobp+276/608] xfs_itobp+0x114/0x260 kernel: [vn_rele+211/224] vn_rele+0xd3/0xe0 kernel: [linvfs_clear_inode+24/48] linvfs_clear_inode+0x18/0x30 kernel: [clear_inode+182/224] clear_inode+0xb6/0xe0 kernel: [generic_delete_inode+216/256] generic_delete_inode+0xd8/0x100 kernel: [_atomic_dec_and_lock+49/80] _atomic_dec_and_lock+0x31/0x50 kernel: [iput+99/144] iput+0x63/0x90 kernel: [xlog_recover_process_iunlinks+785/960] xlog_recover_process_iunlinks+0x311/0x3c0 kernel: [xlog_recover_finish+168/224] xlog_recover_finish+0xa8/0xe0 kernel: [xfs_log_mount_finish+44/48] xfs_log_mount_finish+0x2c/0x30 kernel: [xfs_mountfs+2108/3840] xfs_mountfs+0x83c/0xf00 kernel: [_atomic_dec_and_lock+49/80] _atomic_dec_and_lock+0x31/0x50 kernel: [xfs_readsb+408/512] xfs_readsb+0x198/0x200 kernel: [xfs_ioinit+31/64] xfs_ioinit+0x1f/0x40 kernel: [xfs_mount+686/1216] xfs_mount+0x2ae/0x4c0 kernel: [vfs_mount+67/80] vfs_mount+0x43/0x50 kernel: [linvfs_fill_super+152/496] linvfs_fill_super+0x98/0x1f0 kernel: [snprintf+39/48] snprintf+0x27/0x30 kernel: [disk_name+180/192] disk_name+0xb4/0xc0 kernel: [sb_set_blocksize+46/96] sb_set_blocksize+0x2e/0x60 kernel: [get_sb_bdev+266/336] get_sb_bdev+0x10a/0x150 kernel: [linvfs_get_sb+48/64] linvfs_get_sb+0x30/0x40 kernel: [linvfs_fill_super+0/496] linvfs_fill_super+0x0/0x1f0 kernel: [do_kern_mount+87/224] do_kern_mount+0x57/0xe0 kernel: [do_new_mount+156/224] do_new_mount+0x9c/0xe0 kernel: [do_mount+343/432] do_mount+0x157/0x1b0 kernel: [copy_mount_options+99/192] copy_mount_options+0x63/0xc0 kernel: [sys_mount+159/224] sys_mount+0x9f/0xe0 kernel: [syscall_call+7/11] syscall_call+0x7/0xb kernel: xfs_force_shutdown(drbd0,0x8) called from line 1091 of file fs/xfs/xfs_trans.c. Return address = 0xc0219a3b kernel: Filesystem "drbd0": Corruption of in-memory data detected. Shutting down filesystem: drbd0 kernel: Please umount the filesystem, and rectify the problem(s) kernel: Ending XFS recovery on filesystem: drbd0 (dev: drbd0) kernel: xfs_force_shutdown(drbd0,0x1) called from line 353 of file fs/xfs/xfs_rw.c. Return address = 0xc0219a3b Any insights would be very appreciated. Thanks! --> Jijo -- Federico Sevilla III : jijo.free.net.ph : When we speak of free software GNU/Linux Specialist : GnuPG 0x93B746BE : we refer to freedom, not price.