Thanks to some bad HDDs causing data corruption we were able to check what really happens with some file systems when underlying block device is unreliable.
We created file system on "bad HDD" and saved some files knowing that HDD will not store all the data as written and therefore corruption will occur. With such scenario corruption is unavoidable but to minimise damage it is important to detect corruption as early as possible.
Btrfs (best)
Reading corrupted file(s) resulted in "Input/output error" (i.e. "Cannot read source file"). The following was written to "/var/log/messages" and "/var/log/kern.log":
Mar 27 06:01:51 deblabr kernel: [430667.328062] btrfs csum failed ino 259 off 125612032 csum 1322675045 private 4050170413
Mar 27 06:01:51 deblabr kernel: [430667.328250] btrfs csum failed ino 259 off 125612032 csum 1322675045 private 4050170413
Mar 27 06:01:53 deblabr kernel: [430670.096957] btrfs csum failed ino 259 off 125612032 csum 1322675045 private 4050170413
Mar 27 06:01:53 deblabr kernel: [430670.106365] btrfs csum failed ino 259 off 125612032 csum 1322675045 private 4050170413
Mar 27 06:02:02 deblabr kernel: [430678.980359] btrfs csum failed ino 259 off 125612032 csum 1322675045 private 4050170413
Mar 27 06:02:02 deblabr kernel: [430678.982592] btrfs csum failed ino 259 off 125612032 csum 1322675045 private 4050170413
Btrfs can scan itself to detect corruption: btrfs scrub start -B /mnt/tmp
scrub done for 0b1a9d7d-28ad-4dc9-a195-ff3a19dff23d
scrub started at Sun Mar 31 01:26:51 2013 and finished after 59 seconds
total bytes scrubbed: 3.91GB with 995 errors
error details: csum=995
corrected errors: 0, uncorrectable errors: 995, unverified errors: 0
During scrub the following was logged to "/var/log/kern.log":
Mar 31 01:26:51 deblabr kernel: [759603.622059] btrfs: checksum error at logical 432504832 on dev /dev/sdr1, sector 1642176, root 5, inode 257, offset 3244032, length 4096, links 1 (path: itest.tar.xz)
Mar 31 01:26:51 deblabr kernel: [759603.622071] btrfs: bdev /dev/sdr1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
Mar 31 01:26:51 deblabr kernel: [759603.622076] btrfs: unable to fixup (regular) error at logical 432504832 on dev /dev/sdr1
Mar 31 01:26:51 deblabr kernel: [759603.628914] btrfs: checksum error at logical 432508928 on dev /dev/sdr1, sector 1642184, root 5, inode 257, offset 3248128, length 4096, links 1 (path: itest.tar.xz)
Mar 31 01:26:51 deblabr kernel: [759603.628925] btrfs: bdev /dev/sdr1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
Mar 31 01:26:51 deblabr kernel: [759603.628929] btrfs: unable to fixup (regular) error at logical 432508928 on dev /dev/sdr1
Mar 31 01:26:51 deblabr kernel: [759603.636107] btrfs: checksum error at logical 432513024 on dev /dev/sdr1, sector 1642192, root 5, inode 257, offset 3252224, length 4096, links 1 (path: itest.tar.xz)
Mar 31 01:26:51 deblabr kernel: [759603.636118] btrfs: bdev /dev/sdr1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
Mar 31 01:26:51 deblabr kernel: [759603.636122] btrfs: unable to fixup (regular) error at logical 432513024 on dev /dev/sdr1
and to "/var/log/messages":
Mar 31 01:26:51 deblabr kernel: [759603.622059] btrfs: checksum error at logical 432504832 on dev /dev/sdr1, sector 1642176, root 5, inode 257, offset 3244032, length 4096, links 1 (path: itest.tar.xz)
Mar 31 01:26:51 deblabr kernel: [759603.628914] btrfs: checksum error at logical 432508928 on dev /dev/sdr1, sector 1642184, root 5, inode 257, offset 3248128, length 4096, links 1 (path: itest.tar.xz)
Mar 31 01:26:51 deblabr kernel: [759603.636107] btrfs: checksum error at logical 432513024 on dev /dev/sdr1, sector 1642192, root 5, inode 257, offset 3252224, length 4096, links 1 (path: itest.tar.xz)
NILFS2 (worst)
Unlike other file systems that move unchanged data only during
defragmentation, NILFS2 run nilfs_cleanerd
process that re-shuffles
unmodified data and therefore amplifies damage from corruption.
NILFS2 do not check data on read so more corruption occurs during
periods of nilfs_cleanerd
activity.
Eventually when error affects btree node the errors logged to
"/var/log/kern.log" may look like the following:
Mar 31 01:17:30 deblabr kernel: [759042.984783] NILFS: bad btree node (blocknr=938583): level = 192, flags = 0x73, nchildren = 49956
Mar 31 01:17:30 deblabr kernel: [759042.984850] NILFS: GC failed during preparation: cannot read source blocks: err=-5
Eventually due to errors NILFS2 will re-mount itself as read-only:
Mar 30 19:56:59 deblabr kernel: [739821.894963] NILFS: bad btree node (blocknr=1086570306): level = 239, flags = 0xe2, nchildren = 10392
Mar 30 19:56:59 deblabr kernel: [739821.894969] NILFS error (device dm-0): nilfs_bmap_last_key: broken bmap (inode number=1225452)
Mar 30 19:56:59 deblabr kernel: [739821.894969]
Mar 30 19:56:59 deblabr kernel: [739821.894971] Remounting filesystem read-only
Mar 30 19:56:59 deblabr kernel: [739821.894973] NILFS warning (device dm-0): nilfs_truncate_bmap: failed to truncate bmap (ino=1225452, err=-5)
Little can be done to recover from such condition due to lack
of fsck
repair tool.
ext4
When corruption affects ext4 meta data it can re-mount itself in
read-only mode.
Recovering ext4 with fsck.ext4
is trivial and corruption of old data
is not happening unless defragmentation is run.
Conclusion
Btrfs is strategically important for data integrity.
Other Linux file systems do nothing to ensure that data is read exactly as it was written. Unless Btrfs is used data corruption is likely to be detected much later and therefore more damage will be done.