From: "Corentin “Nado” Pazdera" <nado@troglodyte.be>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Replacement for gcruft: gcrud
Date: Fri, 17 Aug 2018 12:15:38 +0000 [thread overview]
Message-ID: <f99a251397749798caff3237d9bd4be7@troglodyte.be> (raw)
In-Reply-To: <4DD980A7-F048-4526-B188-778A03D7719F@gmail.com>
August 17, 2018 1:09 AM, "Andrew Udvare" <audvare@gmail.com> wrote:
> The whitelist is the biggest work in progress right now. Most of what it lists from /etc for me is
> /etc/config-archive which AFAIK is not managed by Portage at all although Portage will place old
> files there? I don't use the feature because my /etc is controlled by Git. The stuff listed in
> /var/ is pretty accurate as there's a lot of old website cruft and this computer does not serve
> anything like that anymore.
Well, for example I use eselect-repository which puts repos in /var/dbr/repos, I put gentoo tree in
there as well and the whole tree is suggested for deletion.
A solution would be to read /etc/portage/repos.conf file(s) for repos location during the runtime
detection, or use portageq interface.
Or tell people to whitelist manually their repos location when the config file will be available ;)
You could add in whitelist directories containing a .keep file, although I'm not sure how to
specify it.
Same goes for git repositories, I’d rather delete a whole git repo or nothing at all inside, so
adding a rule which can interprets "pick parent dir of a .git dir to suggest deletion, ignore all
children of said parent".
> The idea is to move to everything in the whitelist.c file to a declarative (no code unless you
> count RE) configuration file. I have not decided on a format but I am leaning towards INI-style
> because GLib2 has a parser for that built-in. The config file will specify exact paths, RE, and
> globs. There will be a default dynamic list generated at runtime based on what packages you have
> installed (as gcruft had this feature).
That will be nice, waiting for it ;) Something basic might be enough for making batches of test
before choosing a definite format.
>> I also caught some wrongly listed files because of the multilib system with /lib symlink.
>> For example, dhcpcd declared /lib/dhcpcd/dhcpcd-hooks, thus the realpath /lib64/dhcpcd/dhcpcd-hooks
>> was listed in the removal suggestion. This should be fixed with profile 17.1
>
> The /lib vs /lib64 issue will be resolved in a later version. I think I need to use lstat()
> everywhere instead of stat(), or I can call realpath() prior to storing values in the set. This
> file should be whitelisted, but only if you have dhcpcd installed (I've long since moved to dhcpd).
I’m in favor of the realpath suggestion, this will be useful for any symlinked accessed path.
>> The log is so huge at the moment it is useless for me :/
>>
>> % wc -l out.log
>> 461575 out.log
>
> Any thoughts on how to simplify analysis?
A few, but I’m not sure if I have much which are /universal/ in gentoo systems.
Do you plan to integrate the sorting part in gcrud directly?
If so, I’d suggest bringing /usr/* stuff first to show, because un-owned files should be
exceptions.
Same goes for /lib, but stuff like kernel modules should be treated carefully, we can either
whitelist the whole /lib{,32,64}/modules, or try being smart and select old kernel modules only.
This might be tricky given the number of ways someone can manage them.
Also, here is small analysis of files locations by gcrud.
% cut -d/ -f2 out.log|uniq -c
295 etc
3309 lib64
1178 lib
13 opt
39586 usr
417194 var
/var containing my different repos, its logical it contains most occurences.
Next goes usr, containing another lib{,32,64} schema with /usr/lib pointing to /usr/lib64, with go
packages installed (in /usr/lib64/go).
With these informations, I suppose most will disappear when using realpath/switching to 17.1
profile.
Thanks for your work, this will probably a excellent tool in a few commits ;)
Regards,
Corentin “Nado” Pazdera
prev parent reply other threads:[~2018-08-17 12:16 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-16 6:07 [gentoo-user] Replacement for gcruft: gcrud Andrew Udvare
2018-08-16 18:22 ` [gentoo-user] " james
2018-08-16 23:25 ` [gentoo-user] " Andrew Udvare
2018-08-16 20:09 ` Corentin “Nado” Pazdera
2018-08-16 23:09 ` Andrew Udvare
2018-08-17 12:15 ` Corentin “Nado” Pazdera [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f99a251397749798caff3237d9bd4be7@troglodyte.be \
--to=nado@troglodyte.be \
--cc=gentoo-user@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox