public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
From: Daniel Pielmeier <billie@gentoo.org>
To: gentoo-dev@lists.gentoo.org
Subject: Re: [gentoo-dev] [RFC][NEW] Utility to find orphaned files
Date: Sun, 25 Apr 2010 15:43:08 +0200	[thread overview]
Message-ID: <4BD446EC.2070809@gentoo.org> (raw)
In-Reply-To: <4BD42501.9070505@gentoo.org>


[-- Attachment #1.1: Type: text/plain, Size: 2013 bytes --]

Angelo Arrifano schrieb am 25.04.2010 13:18:
> Hello developers developers and developers,
> 
> Ever wondered how much crap is left in your X-years old Gentoo box?
> 
> I just developed a python utility to efficiently find orphaned files in
> the system. By orphaned files I mean the files that are present on
> system directories and don't belong to any installed package.
> 
> The package builds a virtual filesystem (cache) on the RAM using python
> hash tables. Then it uses the cache to find the ownership of files
> inside user-specified dirs.
> 
> Building the cache takes less than 10 seconds here in a system with 1366
> installed packages.
> 
> This is not intended to be a finished program yet, I'm looking forward
> for your constructive commentaries.

What about searching the complete file system but using an exclude file where
you can put directories and files which should not be searched. It is tedious to
tell every path on the command-line. Also for instance if you specify /lib it
will also search under /lib/modules and I am sure you do not consider all
contents there as unneeded.

You also need to consider that your tool will return other false positives like
byte compiled python modules and perl header files. In general everything an
ebuild does in phases where it adds files to file-system but files are not
stored to CONTENTS (pkg_{pre,post}inst). At this point the files are needed but
not recognized by the package manager. If the ebuild does not take care of this
files when removing (pkg_{pre,post}rm) the package they will remain on the
file-system and are now unneeded.

I have written something in perl which I recently tried to implement in python
(not the same functionality like the perl version yet). I am not a good perl or
python programmer but it fits my needs especially the perl version as I know a
bit more perl than python.

I attach both versions and a sample exclude file. Maybe it will be of help.

-- 
Daniel Pielmeier

[-- Attachment #1.2: cruft.tar.bz2 --]
[-- Type: application/x-bzip2, Size: 5687 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

  parent reply	other threads:[~2010-04-25 13:43 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-25 11:18 [gentoo-dev] [RFC][NEW] Utility to find orphaned files Angelo Arrifano
2010-04-25 11:45 ` Brian Harring
2010-04-25 13:43 ` Daniel Pielmeier [this message]
2010-04-30 16:24   ` Enrico Weigelt
2010-05-03 13:34     ` [gentoo-dev] " Peter Hjalmarsson
2010-05-11 13:08       ` Angelo Arrifano
2010-04-25 15:34 ` [gentoo-dev] " Yuri Vasilevski
2010-04-25 17:10   ` Angelo Arrifano
2010-04-25 17:43 ` Benedikt Böhm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BD446EC.2070809@gentoo.org \
    --to=billie@gentoo.org \
    --cc=gentoo-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox