[DRBD-user] Unrecoverable XFS Dinode Corruption on DRBD

Federico Sevilla III jijo at free.net.ph
Tue Dec 6 00:25:37 CET 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

Today we ran into unrecoverable XFS dinode corruption on a DRBD cluster
that's been in production for two years, with 146 day uptime on the
machines. This is the first time we've ran into this problem. Both
machines run Debian GNU/Linux Sarge with Linux 2.6.12.2 and DRBD 0.7.10.
These are HP Proliant DL380 servers with ECC RAM, hardware RAID, and
reliable UPS systems. Both DRBD nodes were consistent (and the errors
occured regardless of which node we would attempt to mount the
filesystem on).

During normal operations, the primary machine hit I/O error accessing
PostgreSQL database file(s). I am attaching logs of these. We could not
mount the filesystem after, because of "corruption of in-memory
detected". We were able to restore access to the filesystem by running
xfs_repair with the -L option. Unfortunately, we were not able to save a
copy of the output of xfs_repair. The result is a filesystem with some
corrupt files, which we're beginning to restore from backups.

I need help understanding what could have caused this corruption,
though, and what we can do to help prevent this from re-occurring.

Here are log snapshots of the initial errors (which repeat themselves
identically as attempts are made to access the PostgreSQL database):

    kernel: Filesystem "drbd0": corrupt dinode 203438180, (btree extents).  Unmount and run xfs_repair.
    kernel: Filesystem "drbd0": XFS internal error xfs_bmap_read_extents(1) at line 4374 of file fs/xfs/xfs_bmap.c.  Caller 0xc01ea147
    kernel:  [xfs_bmap_read_extents+993/1232] xfs_bmap_read_extents+0x3e1/0x4d0
    kernel:  [xfs_iread_extents+119/272] xfs_iread_extents+0x77/0x110
    kernel:  [xfs_iread_extents+119/272] xfs_iread_extents+0x77/0x110
    kernel:  [xfs_bmapi+537/5664] xfs_bmapi+0x219/0x1620
    kernel:  [__elv_add_request+120/192] __elv_add_request+0x78/0xc0
    kernel:  [__make_request+708/1248] __make_request+0x2c4/0x4e0
    kernel:  [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60
    kernel:  [generic_make_request+343/496] generic_make_request+0x157/0x1f0
    kernel:  [as_merged_request+78/496] as_merged_request+0x4e/0x1f0
    kernel:  [elv_merged_request+37/48] elv_merged_request+0x25/0x30
    kernel:  [__make_request+708/1248] __make_request+0x2c4/0x4e0
    kernel:  [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60
    kernel:  [generic_make_request+343/496] generic_make_request+0x157/0x1f0
    kernel:  [xfs_bmbt_get_state+47/64] xfs_bmbt_get_state+0x2f/0x40
    kernel:  [find_lock_page+166/192] find_lock_page+0xa6/0xc0
    kernel:  [xfs_iomap+410/1360] xfs_iomap+0x19a/0x550
    kernel:  [xfs_buf_get_flags+238/352] xfs_buf_get_flags+0xee/0x160
    kernel:  [__linvfs_get_block+146/848] __linvfs_get_block+0x92/0x350
    kernel:  [find_lock_page+166/192] find_lock_page+0xa6/0xc0
    kernel:  [find_or_create_page+48/208] find_or_create_page+0x30/0xd0
    kernel:  [_pagebuf_lookup_pages+514/848] _pagebuf_lookup_pages+0x202/0x350
    kernel:  [linvfs_get_block+60/64] linvfs_get_block+0x3c/0x40
    kernel:  [do_mpage_readpage+972/992] do_mpage_readpage+0x3cc/0x3e0
    kernel:  [radix_tree_node_alloc+31/112] radix_tree_node_alloc+0x1f/0x70
    kernel:  [radix_tree_insert+237/272] radix_tree_insert+0xed/0x110
    kernel:  [add_to_page_cache+195/208] add_to_page_cache+0xc3/0xd0
    kernel:  [mpage_readpages+302/352] mpage_readpages+0x12e/0x160
    kernel:  [linvfs_get_block+0/64] linvfs_get_block+0x0/0x40
    kernel:  [xfs_iformat+1185/1520] xfs_iformat+0x4a1/0x5f0
    kernel:  [rmqueue_bulk+53/128] rmqueue_bulk+0x35/0x80
    kernel:  [prep_new_page+96/112] prep_new_page+0x60/0x70
    kernel:  [read_pages+306/320] read_pages+0x132/0x140
    kernel:  [linvfs_get_block+0/64] linvfs_get_block+0x0/0x40
    kernel:  [__alloc_pages+739/1072] __alloc_pages+0x2e3/0x430
    kernel:  [__do_page_cache_readahead+253/368] __do_page_cache_readahead+0xfd/0x170
    kernel:  [blockable_page_cache_readahead+89/224] blockable_page_cache_readahead+0x59/0xe0
    kernel:  [page_cache_readahead+294/432] page_cache_readahead+0x126/0x1b0
    kernel:  [do_generic_mapping_read+1576/1600] do_generic_mapping_read+0x628/0x640
    kernel:  [__generic_file_aio_read+514/576] __generic_file_aio_read+0x202/0x240
    kernel:  [file_read_actor+0/240] file_read_actor+0x0/0xf0
    kernel:  [dput+51/464] dput+0x33/0x1d0
    kernel:  [xfs_read+312/784] xfs_read+0x138/0x310
    kernel:  [dput+51/464] dput+0x33/0x1d0
    kernel:  [linvfs_aio_read+142/160] linvfs_aio_read+0x8e/0xa0
    kernel:  [do_sync_read+183/240] do_sync_read+0xb7/0xf0
    kernel:  [linvfs_open+138/144] linvfs_open+0x8a/0x90
    kernel:  [dentry_open+235/512] dentry_open+0xeb/0x200
    kernel:  [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60
    kernel:  [vfs_read+174/304] vfs_read+0xae/0x130
    kernel:  [sys_read+81/128] sys_read+0x51/0x80
    kernel:  [syscall_call+7/11] syscall_call+0x7/0xb

Here is a snapshot of the error when attempting to mount the damaged
filesystem:

    kernel: XFS mounting filesystem drbd0
    kernel: Starting XFS recovery on filesystem: drbd0 (dev: drbd0)
    kernel: Filesystem "drbd0": corrupt dinode 203438180, (btree extents).  Unmount and run xfs_repair.
    kernel: Filesystem "drbd0": XFS internal error xfs_bmap_read_extents(1) at line 4374 of file fs/xfs/xfs_bmap.c.  Caller 0xc01ea147
    kernel:  [xfs_bmap_read_extents+993/1232] xfs_bmap_read_extents+0x3e1/0x4d0
    kernel:  [xfs_iread_extents+119/272] xfs_iread_extents+0x77/0x110
    kernel:  [generic_make_request+343/496] generic_make_request+0x157/0x1f0
    kernel:  [xfs_iread_extents+119/272] xfs_iread_extents+0x77/0x110
    kernel:  [xfs_bunmapi+4065/4224] xfs_bunmapi+0xfe1/0x1080
    kernel:  [_pagebuf_lookup_pages+514/848] _pagebuf_lookup_pages+0x202/0x350
    kernel:  [xfs_buf_get_flags+238/352] xfs_buf_get_flags+0xee/0x160
    kernel:  [xfs_buf_read_flags+58/176] xfs_buf_read_flags+0x3a/0xb0
    kernel:  [xlog_grant_log_space+959/1008] xlog_grant_log_space+0x3bf/0x3f0
    kernel:  [cache_alloc_refill+320/560] cache_alloc_refill+0x140/0x230
    kernel:  [kmem_zone_alloc+76/192] kmem_zone_alloc+0x4c/0xc0
    kernel:  [xfs_trans_log_inode+45/96] xfs_trans_log_inode+0x2d/0x60
    kernel:  [xfs_itruncate_finish+481/1072] xfs_itruncate_finish+0x1e1/0x430
    kernel:  [xfs_inactive+1284/1376] xfs_inactive+0x504/0x560
    kernel:  [pagebuf_offset+51/64] pagebuf_offset+0x33/0x40
    kernel:  [xfs_itobp+276/608] xfs_itobp+0x114/0x260
    kernel:  [vn_rele+211/224] vn_rele+0xd3/0xe0
    kernel:  [linvfs_clear_inode+24/48] linvfs_clear_inode+0x18/0x30
    kernel:  [clear_inode+182/224] clear_inode+0xb6/0xe0
    kernel:  [generic_delete_inode+216/256] generic_delete_inode+0xd8/0x100
    kernel:  [_atomic_dec_and_lock+49/80] _atomic_dec_and_lock+0x31/0x50
    kernel:  [iput+99/144] iput+0x63/0x90
    kernel:  [xlog_recover_process_iunlinks+785/960] xlog_recover_process_iunlinks+0x311/0x3c0
    kernel:  [xlog_recover_finish+168/224] xlog_recover_finish+0xa8/0xe0
    kernel:  [xfs_log_mount_finish+44/48] xfs_log_mount_finish+0x2c/0x30
    kernel:  [xfs_mountfs+2108/3840] xfs_mountfs+0x83c/0xf00
    kernel:  [_atomic_dec_and_lock+49/80] _atomic_dec_and_lock+0x31/0x50
    kernel:  [xfs_readsb+408/512] xfs_readsb+0x198/0x200
    kernel:  [xfs_ioinit+31/64] xfs_ioinit+0x1f/0x40
    kernel:  [xfs_mount+686/1216] xfs_mount+0x2ae/0x4c0
    kernel:  [vfs_mount+67/80] vfs_mount+0x43/0x50
    kernel:  [linvfs_fill_super+152/496] linvfs_fill_super+0x98/0x1f0
    kernel:  [snprintf+39/48] snprintf+0x27/0x30
    kernel:  [disk_name+180/192] disk_name+0xb4/0xc0
    kernel:  [sb_set_blocksize+46/96] sb_set_blocksize+0x2e/0x60
    kernel:  [get_sb_bdev+266/336] get_sb_bdev+0x10a/0x150
    kernel:  [linvfs_get_sb+48/64] linvfs_get_sb+0x30/0x40
    kernel:  [linvfs_fill_super+0/496] linvfs_fill_super+0x0/0x1f0
    kernel:  [do_kern_mount+87/224] do_kern_mount+0x57/0xe0
    kernel:  [do_new_mount+156/224] do_new_mount+0x9c/0xe0
    kernel:  [do_mount+343/432] do_mount+0x157/0x1b0
    kernel:  [copy_mount_options+99/192] copy_mount_options+0x63/0xc0
    kernel:  [sys_mount+159/224] sys_mount+0x9f/0xe0
    kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
    kernel: xfs_force_shutdown(drbd0,0x8) called from line 1091 of file fs/xfs/xfs_trans.c.  Return address = 0xc0219a3b
    kernel: Filesystem "drbd0": Corruption of in-memory data detected.  Shutting down filesystem: drbd0
    kernel: Please umount the filesystem, and rectify the problem(s)
    kernel: Ending XFS recovery on filesystem: drbd0 (dev: drbd0)
    kernel: xfs_force_shutdown(drbd0,0x1) called from line 353 of file fs/xfs/xfs_rw.c.  Return address = 0xc0219a3b

Any insights would be very appreciated. Thanks!

 --> Jijo

-- 
Federico Sevilla III : jijo.free.net.ph : When we speak of free software
GNU/Linux Specialist : GnuPG 0x93B746BE : we refer to freedom, not price.



More information about the drbd-user mailing list