public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] Diagnosing file corruption
@ 2015-08-06  0:34 Bryan Gardiner
  2015-08-06  1:28 ` Fernando Rodriguez
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Bryan Gardiner @ 2015-08-06  0:34 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 2527 bytes --]

Hello list,

On my most recent update, I had some build failures that led me to
find that some files on my root partition have been corrupted.  This
is a new Asus N550JK laptop, a mostly-stable amd64 install with
gentoo-sources-4.0.5 and ext4-root-in-LVM-in-LUKS-on-HDD, and Debian
lives in there too (no problems showed up verifying Debian's packages;
I installed Debian on Jul 1 and used it for a week before getting time
to set up Gentoo).

These are the package merge times, package names, and files that I
found to be corrupted via qcheck (there were also a couple Python
headers that I fixed by rebuilding).  They appear to be filled with
random data.  The binpkg contents in /usr/portage/packages are okay,
so I don't know when the files were corrupted; their mtimes haven't
been updated since the packages were installed.

Thu-Jul-30-22:40:23-2015 app-arch/p7zip-9.20.1-r5 /usr/lib64/p7zip/Lang/va.txt
Thu-Jul-30-22:40:23-2015 app-arch/p7zip-9.20.1-r5 /usr/lib64/p7zip/help/cmdline/switches/large_pages.htm
Sun-Jul-19-22:34:30-2015 dev-libs/libzip-1.0.1 /usr/share/man/man3/zip_error_get_sys_type.3.bz2
Sun-Jul-26-22:35:28-2015 dev-python/pygments-2.0.1-r1 /usr/lib64/python2.7/site-packages/pygments/styles/pastie.pyc
Wed-Jul-08-23:34:56-2015 media-libs/tiff-4.0.3-r6 /usr/share/man/man3/TIFFGetField.3tiff.bz2
Thu-Jul-30-10:05:31-2015 sci-mathematics/scilab-5.5.2 /usr/share/scilab/modules/compatibility_functions/macros/%b_l_s.bin
-(from-stage3-on-Jul-8)- sys-apps/acl-2.2.52-r1 /usr/share/man/man3/acl_set_file.3.bz2

I haven't had any unclean shutdowns, it looks like OpenRC is
unmounting things cleanly on shutdown, and suspend appears to work
fine.

After I make a fresh backup of my files, how would you recommend
troubleshooting this?  Run memtest or a hard drive tester?  Since the
files seemingly corrupted themselves after install without being
touched, I'm highly suspicious of the hard drive, but would like to
rule other things out (if say for example that CONFIG_X86_INTEL_PSTATE
CPU clock booster is dangerous, or nvidia-drivers, or ...).  Haven't
checked for corruption on /home yet.

This is the disk:

  *-disk
    description: ATA Disk
    product: ST1000LM024 HN-M
    vendor: Seagate
    physical id: 0.0.0
    bus info: scsi@4:0.0.0
    logical name: /dev/sda
    version: 0001
    size: 931GiB (1TB)
    capabilities: gpt-1.00 partitioned partitioned:gpt
    configuration: ansiversion=5
    guid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx sectorsize=4096

Thanks for any help you can provide,
Bryan

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-user] Diagnosing file corruption
  2015-08-06  0:34 [gentoo-user] Diagnosing file corruption Bryan Gardiner
@ 2015-08-06  1:28 ` Fernando Rodriguez
  2015-08-06  1:48 ` [gentoo-user] " James
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Fernando Rodriguez @ 2015-08-06  1:28 UTC (permalink / raw
  To: gentoo-user

On Wednesday, August 05, 2015 5:34:43 PM Bryan Gardiner wrote:
> Hello list,
> 
> On my most recent update, I had some build failures that led me to
> find that some files on my root partition have been corrupted.  This
> is a new Asus N550JK laptop, a mostly-stable amd64 install with
> gentoo-sources-4.0.5 and ext4-root-in-LVM-in-LUKS-on-HDD, and Debian
> lives in there too (no problems showed up verifying Debian's packages;
> I installed Debian on Jul 1 and used it for a week before getting time
> to set up Gentoo).
> 
> These are the package merge times, package names, and files that I
> found to be corrupted via qcheck (there were also a couple Python
> headers that I fixed by rebuilding).  They appear to be filled with
> random data.  The binpkg contents in /usr/portage/packages are okay,
> so I don't know when the files were corrupted; their mtimes haven't
> been updated since the packages were installed.
> 
> Thu-Jul-30-22:40:23-2015 app-arch/p7zip-9.20.1-r5 
/usr/lib64/p7zip/Lang/va.txt
> Thu-Jul-30-22:40:23-2015 app-arch/p7zip-9.20.1-r5 
/usr/lib64/p7zip/help/cmdline/switches/large_pages.htm
> Sun-Jul-19-22:34:30-2015 dev-libs/libzip-1.0.1 
/usr/share/man/man3/zip_error_get_sys_type.3.bz2
> Sun-Jul-26-22:35:28-2015 dev-python/pygments-2.0.1-r1 
/usr/lib64/python2.7/site-packages/pygments/styles/pastie.pyc
> Wed-Jul-08-23:34:56-2015 media-libs/tiff-4.0.3-r6 
/usr/share/man/man3/TIFFGetField.3tiff.bz2
> Thu-Jul-30-10:05:31-2015 sci-mathematics/scilab-5.5.2 
/usr/share/scilab/modules/compatibility_functions/macros/%b_l_s.bin
> -(from-stage3-on-Jul-8)- sys-apps/acl-2.2.52-r1 
/usr/share/man/man3/acl_set_file.3.bz2
> 
> I haven't had any unclean shutdowns, it looks like OpenRC is
> unmounting things cleanly on shutdown, and suspend appears to work
> fine.
> 
> After I make a fresh backup of my files, how would you recommend
> troubleshooting this?  Run memtest or a hard drive tester?  Since the
> files seemingly corrupted themselves after install without being
> touched, I'm highly suspicious of the hard drive, but would like to
> rule other things out (if say for example that CONFIG_X86_INTEL_PSTATE
> CPU clock booster is dangerous, or nvidia-drivers, or ...).  Haven't
> checked for corruption on /home yet.
> 
> This is the disk:
> 
>   *-disk
>     description: ATA Disk
>     product: ST1000LM024 HN-M
>     vendor: Seagate
>     physical id: 0.0.0
>     bus info: scsi@4:0.0.0
>     logical name: /dev/sda
>     version: 0001
>     size: 931GiB (1TB)
>     capabilities: gpt-1.00 partitioned partitioned:gpt
>     configuration: ansiversion=5
>     guid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx sectorsize=4096
> 
> Thanks for any help you can provide,
> Bryan

You can use badblocks to rule out a bad drive (be sure to read the 
documentation first if you haven't). But I would guess that something LUKS 
related is more likely. There may be clues in your log files (probably around 
the time when you installed these packages).

-- 
Fernando Rodriguez


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [gentoo-user] Re: Diagnosing file corruption
  2015-08-06  0:34 [gentoo-user] Diagnosing file corruption Bryan Gardiner
  2015-08-06  1:28 ` Fernando Rodriguez
@ 2015-08-06  1:48 ` James
  2015-08-06  2:00 ` [gentoo-user] " wraeth
  2015-08-06 21:25 ` Bob Wya
  3 siblings, 0 replies; 6+ messages in thread
From: James @ 2015-08-06  1:48 UTC (permalink / raw
  To: gentoo-user

Bryan Gardiner <bog <at> khumba.net> writes:


> On my most recent update, I had some build failures that led me to
> find that some files on my root partition have been corrupted.  

Pretty open ended statement, so here's a few ideas.


'eix -cC app-forensics' will give a brief description of tools 
in that app-forensics category, so you can see what you have to
work with. Other tools exist in other categories.

I'm going to ignore the luks issues so others can chime in on that issue.


A while back I ran across app-forensics/AIDE::

" Typically, a system administrator will create an AIDE database on a new
system before it is brought onto the network. This first AIDE database is a
snapshot of the system in it's normal state and the yardstick by which all
subsequent updates and changes will be measured. " [1]


Sounds great as a replacement for tripwire. I have yet to use this,
but it'll be on my next system. You can use the -fetch option to 
download the fresh version of the packages (assuming you have deleted them
first) where you suspect corruption and compile/install those again. 
Then set up AIDE?

Sounds like a great idea for an internet facing server. 

Once you download those replacement packages, just unplug your ethernet
until you are prepared to reconnect.

[1] http://aide.sourceforge.net/stable/manual.html


hth,
James



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-user] Diagnosing file corruption
  2015-08-06  0:34 [gentoo-user] Diagnosing file corruption Bryan Gardiner
  2015-08-06  1:28 ` Fernando Rodriguez
  2015-08-06  1:48 ` [gentoo-user] " James
@ 2015-08-06  2:00 ` wraeth
  2015-08-16 15:17   ` Bryan Gardiner
  2015-08-06 21:25 ` Bob Wya
  3 siblings, 1 reply; 6+ messages in thread
From: wraeth @ 2015-08-06  2:00 UTC (permalink / raw
  To: gentoo-user

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 06/08/15 10:34, Bryan Gardiner wrote:
> After I make a fresh backup of my files, how would you recommend 
> troubleshooting this?  Run memtest or a hard drive tester?  Since
> the files seemingly corrupted themselves after install without
> being touched, I'm highly suspicious of the hard drive, but would
> like to rule other things out (if say for example that
> CONFIG_X86_INTEL_PSTATE CPU clock booster is dangerous, or
> nvidia-drivers, or ...).  Haven't checked for corruption on /home
> yet.

One key question that doesn't seem to have been asked yet: have you
performed an fsck on the partition? You could try booting to a livecd
environment and running

  fsck -fc /dev/sdXY

(adjusting for your device schema accordingly) on your apparently
failing partition(s) to see if there is a filesystem corruption...

- -- 
wraeth <wraeth@wraeth.id.au>
GnuPG Key: B2D9F759
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iF4EAREIAAYFAlXCv7kACgkQXcRKerLZ91npQwD/U41L/qmK8g7d0bWx6tR3SxbW
4bGheAvX3lWJvgMnG9QA/AuO7wnaKTcWeqoT7c+R7e8UHaaOfwaoS1w2J2hGVINJ
=Ykkl
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-user] Diagnosing file corruption
  2015-08-06  0:34 [gentoo-user] Diagnosing file corruption Bryan Gardiner
                   ` (2 preceding siblings ...)
  2015-08-06  2:00 ` [gentoo-user] " wraeth
@ 2015-08-06 21:25 ` Bob Wya
  3 siblings, 0 replies; 6+ messages in thread
From: Bob Wya @ 2015-08-06 21:25 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 618 bytes --]

On 6 August 2015 at 01:34, Bryan Gardiner <bog@khumba.net> wrote:

> Hello list,
>
> ....
>
> This is the disk:
>
>   *-disk
>     description: ATA Disk
>     product: ST1000LM024 HN-M
>     vendor: Seagate
>     physical id: 0.0.0
>     bus info: scsi@4:0.0.0
>     logical name: /dev/sda
>     version: 0001
>     size: 931GiB (1TB)
>     capabilities: gpt-1.00 partitioned partitioned:gpt
>     configuration: ansiversion=5
>     guid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx sectorsize=4096
>
> Thanks for any help you can provide,
> Bryan
>

Complex question. Simple answer... Spinrite :-)

-- 

All the best,
Robert

[-- Attachment #2: Type: text/html, Size: 1165 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-user] Diagnosing file corruption
  2015-08-06  2:00 ` [gentoo-user] " wraeth
@ 2015-08-16 15:17   ` Bryan Gardiner
  0 siblings, 0 replies; 6+ messages in thread
From: Bryan Gardiner @ 2015-08-16 15:17 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 2214 bytes --]

On Thu, Aug 06, 2015 at 12:00:30PM +1000, wraeth wrote:
> On 06/08/15 10:34, Bryan Gardiner wrote:
> > After I make a fresh backup of my files, how would you recommend 
> > troubleshooting this?  Run memtest or a hard drive tester?  Since
> > the files seemingly corrupted themselves after install without
> > being touched, I'm highly suspicious of the hard drive, but would
> > like to rule other things out (if say for example that
> > CONFIG_X86_INTEL_PSTATE CPU clock booster is dangerous, or
> > nvidia-drivers, or ...).  Haven't checked for corruption on /home
> > yet.
> 
> One key question that doesn't seem to have been asked yet: have you
> performed an fsck on the partition? You could try booting to a livecd
> environment and running
> 
>   fsck -fc /dev/sdXY
> 
> (adjusting for your device schema accordingly) on your apparently
> failing partition(s) to see if there is a filesystem corruption...

Thanks very much for the suggestions, everyone.  I ended up using fsck
-fc and -fcc, which resulted in no bad blocks being detected.  I also
wanted to make sure no other files in that range of disk were
corrupted, so I extracted the extents used by the bad files:

  cat bad-files | while read file; do
      echo ">>> ${file} <<<"
      debugfs -R "dump_extents ${file}" /dev/mikasa-vg/gentoo
  done >bad-extents

found the files in the regions between the bad files:

  for block in $(seq 5302485 5302486) $(seq 5302489 5302498) $(seq 5302504 5302508); do
      inode="$(debugfs -R "icheck ${block}" /dev/mikasa-vg/gentoo 2>/dev/null | perl -ne 'if (/^\d+\s+(\d+)$/) {print $1, "\n"}')"
      if [[ -n $inode ]]; then
          echo "${block} ${inode} $(debugfs -R "ncheck ${inode}" /dev/mikasa-vg/gentoo 2>/dev/null | awk 'NR==2 {print $2}')"
      else
          echo "${block}"
      fi
  done

and file'd those to make sure that they were okay.  This is only a
personal computer, so I'm going to call this a one-off issue and move
on, and leave the stronger approaches for another day.

Thanks again!
Bryan

-- 
If people do not believe that mathematics is simple, it is only
because they do not realize how complicated life is - von Neumann

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-08-16 15:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-06  0:34 [gentoo-user] Diagnosing file corruption Bryan Gardiner
2015-08-06  1:28 ` Fernando Rodriguez
2015-08-06  1:48 ` [gentoo-user] " James
2015-08-06  2:00 ` [gentoo-user] " wraeth
2015-08-16 15:17   ` Bryan Gardiner
2015-08-06 21:25 ` Bob Wya

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox