Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi,
Today we ran into unrecoverable XFS dinode corruption on a DRBD cluster
that's been in production for two years, with 146 day uptime on the
machines. This is the first time we've ran into this problem. Both
machines run Debian GNU/Linux Sarge with Linux 2.6.12.2 and DRBD 0.7.10.
These are HP Proliant DL380 servers with ECC RAM, hardware RAID, and
reliable UPS systems. Both DRBD nodes were consistent (and the errors
occured regardless of which node we would attempt to mount the
filesystem on).
During normal operations, the primary machine hit I/O error accessing
PostgreSQL database file(s). I am attaching logs of these. We could not
mount the filesystem after, because of "corruption of in-memory
detected". We were able to restore access to the filesystem by running
xfs_repair with the -L option. Unfortunately, we were not able to save a
copy of the output of xfs_repair. The result is a filesystem with some
corrupt files, which we're beginning to restore from backups.
I need help understanding what could have caused this corruption,
though, and what we can do to help prevent this from re-occurring.
Here are log snapshots of the initial errors (which repeat themselves
identically as attempts are made to access the PostgreSQL database):
kernel: Filesystem "drbd0": corrupt dinode 203438180, (btree extents). Unmount and run xfs_repair.
kernel: Filesystem "drbd0": XFS internal error xfs_bmap_read_extents(1) at line 4374 of file fs/xfs/xfs_bmap.c. Caller 0xc01ea147
kernel: [xfs_bmap_read_extents+993/1232] xfs_bmap_read_extents+0x3e1/0x4d0
kernel: [xfs_iread_extents+119/272] xfs_iread_extents+0x77/0x110
kernel: [xfs_iread_extents+119/272] xfs_iread_extents+0x77/0x110
kernel: [xfs_bmapi+537/5664] xfs_bmapi+0x219/0x1620
kernel: [__elv_add_request+120/192] __elv_add_request+0x78/0xc0
kernel: [__make_request+708/1248] __make_request+0x2c4/0x4e0
kernel: [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60
kernel: [generic_make_request+343/496] generic_make_request+0x157/0x1f0
kernel: [as_merged_request+78/496] as_merged_request+0x4e/0x1f0
kernel: [elv_merged_request+37/48] elv_merged_request+0x25/0x30
kernel: [__make_request+708/1248] __make_request+0x2c4/0x4e0
kernel: [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60
kernel: [generic_make_request+343/496] generic_make_request+0x157/0x1f0
kernel: [xfs_bmbt_get_state+47/64] xfs_bmbt_get_state+0x2f/0x40
kernel: [find_lock_page+166/192] find_lock_page+0xa6/0xc0
kernel: [xfs_iomap+410/1360] xfs_iomap+0x19a/0x550
kernel: [xfs_buf_get_flags+238/352] xfs_buf_get_flags+0xee/0x160
kernel: [__linvfs_get_block+146/848] __linvfs_get_block+0x92/0x350
kernel: [find_lock_page+166/192] find_lock_page+0xa6/0xc0
kernel: [find_or_create_page+48/208] find_or_create_page+0x30/0xd0
kernel: [_pagebuf_lookup_pages+514/848] _pagebuf_lookup_pages+0x202/0x350
kernel: [linvfs_get_block+60/64] linvfs_get_block+0x3c/0x40
kernel: [do_mpage_readpage+972/992] do_mpage_readpage+0x3cc/0x3e0
kernel: [radix_tree_node_alloc+31/112] radix_tree_node_alloc+0x1f/0x70
kernel: [radix_tree_insert+237/272] radix_tree_insert+0xed/0x110
kernel: [add_to_page_cache+195/208] add_to_page_cache+0xc3/0xd0
kernel: [mpage_readpages+302/352] mpage_readpages+0x12e/0x160
kernel: [linvfs_get_block+0/64] linvfs_get_block+0x0/0x40
kernel: [xfs_iformat+1185/1520] xfs_iformat+0x4a1/0x5f0
kernel: [rmqueue_bulk+53/128] rmqueue_bulk+0x35/0x80
kernel: [prep_new_page+96/112] prep_new_page+0x60/0x70
kernel: [read_pages+306/320] read_pages+0x132/0x140
kernel: [linvfs_get_block+0/64] linvfs_get_block+0x0/0x40
kernel: [__alloc_pages+739/1072] __alloc_pages+0x2e3/0x430
kernel: [__do_page_cache_readahead+253/368] __do_page_cache_readahead+0xfd/0x170
kernel: [blockable_page_cache_readahead+89/224] blockable_page_cache_readahead+0x59/0xe0
kernel: [page_cache_readahead+294/432] page_cache_readahead+0x126/0x1b0
kernel: [do_generic_mapping_read+1576/1600] do_generic_mapping_read+0x628/0x640
kernel: [__generic_file_aio_read+514/576] __generic_file_aio_read+0x202/0x240
kernel: [file_read_actor+0/240] file_read_actor+0x0/0xf0
kernel: [dput+51/464] dput+0x33/0x1d0
kernel: [xfs_read+312/784] xfs_read+0x138/0x310
kernel: [dput+51/464] dput+0x33/0x1d0
kernel: [linvfs_aio_read+142/160] linvfs_aio_read+0x8e/0xa0
kernel: [do_sync_read+183/240] do_sync_read+0xb7/0xf0
kernel: [linvfs_open+138/144] linvfs_open+0x8a/0x90
kernel: [dentry_open+235/512] dentry_open+0xeb/0x200
kernel: [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60
kernel: [vfs_read+174/304] vfs_read+0xae/0x130
kernel: [sys_read+81/128] sys_read+0x51/0x80
kernel: [syscall_call+7/11] syscall_call+0x7/0xb
Here is a snapshot of the error when attempting to mount the damaged
filesystem:
kernel: XFS mounting filesystem drbd0
kernel: Starting XFS recovery on filesystem: drbd0 (dev: drbd0)
kernel: Filesystem "drbd0": corrupt dinode 203438180, (btree extents). Unmount and run xfs_repair.
kernel: Filesystem "drbd0": XFS internal error xfs_bmap_read_extents(1) at line 4374 of file fs/xfs/xfs_bmap.c. Caller 0xc01ea147
kernel: [xfs_bmap_read_extents+993/1232] xfs_bmap_read_extents+0x3e1/0x4d0
kernel: [xfs_iread_extents+119/272] xfs_iread_extents+0x77/0x110
kernel: [generic_make_request+343/496] generic_make_request+0x157/0x1f0
kernel: [xfs_iread_extents+119/272] xfs_iread_extents+0x77/0x110
kernel: [xfs_bunmapi+4065/4224] xfs_bunmapi+0xfe1/0x1080
kernel: [_pagebuf_lookup_pages+514/848] _pagebuf_lookup_pages+0x202/0x350
kernel: [xfs_buf_get_flags+238/352] xfs_buf_get_flags+0xee/0x160
kernel: [xfs_buf_read_flags+58/176] xfs_buf_read_flags+0x3a/0xb0
kernel: [xlog_grant_log_space+959/1008] xlog_grant_log_space+0x3bf/0x3f0
kernel: [cache_alloc_refill+320/560] cache_alloc_refill+0x140/0x230
kernel: [kmem_zone_alloc+76/192] kmem_zone_alloc+0x4c/0xc0
kernel: [xfs_trans_log_inode+45/96] xfs_trans_log_inode+0x2d/0x60
kernel: [xfs_itruncate_finish+481/1072] xfs_itruncate_finish+0x1e1/0x430
kernel: [xfs_inactive+1284/1376] xfs_inactive+0x504/0x560
kernel: [pagebuf_offset+51/64] pagebuf_offset+0x33/0x40
kernel: [xfs_itobp+276/608] xfs_itobp+0x114/0x260
kernel: [vn_rele+211/224] vn_rele+0xd3/0xe0
kernel: [linvfs_clear_inode+24/48] linvfs_clear_inode+0x18/0x30
kernel: [clear_inode+182/224] clear_inode+0xb6/0xe0
kernel: [generic_delete_inode+216/256] generic_delete_inode+0xd8/0x100
kernel: [_atomic_dec_and_lock+49/80] _atomic_dec_and_lock+0x31/0x50
kernel: [iput+99/144] iput+0x63/0x90
kernel: [xlog_recover_process_iunlinks+785/960] xlog_recover_process_iunlinks+0x311/0x3c0
kernel: [xlog_recover_finish+168/224] xlog_recover_finish+0xa8/0xe0
kernel: [xfs_log_mount_finish+44/48] xfs_log_mount_finish+0x2c/0x30
kernel: [xfs_mountfs+2108/3840] xfs_mountfs+0x83c/0xf00
kernel: [_atomic_dec_and_lock+49/80] _atomic_dec_and_lock+0x31/0x50
kernel: [xfs_readsb+408/512] xfs_readsb+0x198/0x200
kernel: [xfs_ioinit+31/64] xfs_ioinit+0x1f/0x40
kernel: [xfs_mount+686/1216] xfs_mount+0x2ae/0x4c0
kernel: [vfs_mount+67/80] vfs_mount+0x43/0x50
kernel: [linvfs_fill_super+152/496] linvfs_fill_super+0x98/0x1f0
kernel: [snprintf+39/48] snprintf+0x27/0x30
kernel: [disk_name+180/192] disk_name+0xb4/0xc0
kernel: [sb_set_blocksize+46/96] sb_set_blocksize+0x2e/0x60
kernel: [get_sb_bdev+266/336] get_sb_bdev+0x10a/0x150
kernel: [linvfs_get_sb+48/64] linvfs_get_sb+0x30/0x40
kernel: [linvfs_fill_super+0/496] linvfs_fill_super+0x0/0x1f0
kernel: [do_kern_mount+87/224] do_kern_mount+0x57/0xe0
kernel: [do_new_mount+156/224] do_new_mount+0x9c/0xe0
kernel: [do_mount+343/432] do_mount+0x157/0x1b0
kernel: [copy_mount_options+99/192] copy_mount_options+0x63/0xc0
kernel: [sys_mount+159/224] sys_mount+0x9f/0xe0
kernel: [syscall_call+7/11] syscall_call+0x7/0xb
kernel: xfs_force_shutdown(drbd0,0x8) called from line 1091 of file fs/xfs/xfs_trans.c. Return address = 0xc0219a3b
kernel: Filesystem "drbd0": Corruption of in-memory data detected. Shutting down filesystem: drbd0
kernel: Please umount the filesystem, and rectify the problem(s)
kernel: Ending XFS recovery on filesystem: drbd0 (dev: drbd0)
kernel: xfs_force_shutdown(drbd0,0x1) called from line 353 of file fs/xfs/xfs_rw.c. Return address = 0xc0219a3b
Any insights would be very appreciated. Thanks!
--> Jijo
--
Federico Sevilla III : jijo.free.net.ph : When we speak of free software
GNU/Linux Specialist : GnuPG 0x93B746BE : we refer to freedom, not price.