* [gentoo-user] Replacement for gcruft: gcrud
@ 2018-08-16 6:07 Andrew Udvare
2018-08-16 18:22 ` [gentoo-user] " james
2018-08-16 20:09 ` Corentin “Nado” Pazdera
0 siblings, 2 replies; 6+ messages in thread
From: Andrew Udvare @ 2018-08-16 6:07 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1.1: Type: text/plain, Size: 1241 bytes --]
gcruft seems to have died off (https://www.google.com/search?q=gcruft
only returns ebuild results). I was using it quite a lot and wrote many
exception files. It's gone now with no way for my or anyone else's
ebuild to get the original source. I did preserve it though, here:
https://gitlab.com/Tatsh/gcruft
I wrote a replacement in C named gcrud. It only needs GLib2 installed to
work. It's much faster than gcruft ever was. The code is here:
https://gitlab.com/Tatsh/gcrud
https://github.com/Tatsh/gcrud
I am placing preference in GitLab for issues and merge requests, but I
will accept PRs from GitHub.
The whitelist https://gitlab.com/Tatsh/gcrud/blob/master/whitelist.c is
currently hard-coded and limited but the results are satisfactory for
now in my use cases.
Type use case:
sudo ./gcrud | sort -u > out.log
Examine out.log for things you can delete. There are absolutely zero
calls to delete files from the machine in my code and never will be any
kind of automation support.
If anyone tries it out I certainly would like to see your output and get
some bug reports or suggestions. The main feature planned is reading
from a configuration file for exact file paths and regexs.
--
Andrew
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* [gentoo-user] Re: Replacement for gcruft: gcrud
2018-08-16 6:07 [gentoo-user] Replacement for gcruft: gcrud Andrew Udvare
@ 2018-08-16 18:22 ` james
2018-08-16 23:25 ` [gentoo-user] " Andrew Udvare
2018-08-16 20:09 ` Corentin “Nado” Pazdera
1 sibling, 1 reply; 6+ messages in thread
From: james @ 2018-08-16 18:22 UTC (permalink / raw
To: gentoo-user
On 08/16/18 02:07, Andrew Udvare wrote:
> gcruft seems to have died off (https://www.google.com/search?q=gcruft
> only returns ebuild results).
It might (not really sure) be active but it appears to still be around
as ebuilds::
eix -R gcruft
* app-portage/gcruft
Available versions: ~0.1-r1^m[1] ~0.1-r1^m[2] ~0.1.1^m[1] ~0.1.1^m[2]
Homepage: http://www.genoetigt.de/site/projects/gcruft
Description: helps finding orphaned files on a gentoo system
> I was using it quite a lot and wrote many
> exception files. It's gone now with no way for my or anyone else's
> ebuild to get the original source. I did preserve it though, here:
> https://gitlab.com/Tatsh/gcruft
Thanks for caring!
> I wrote a replacement in C named gcrud. It only needs GLib2 installed to
> work. It's much faster than gcruft ever was. The code is here:
>
> https://gitlab.com/Tatsh/gcrud
> https://github.com/Tatsh/gcrud
It's going to take me a while to get aroud to testing, but I really
really like admin codes in "C" so it is automatically on my short list....
>
> I am placing preference in GitLab for issues and merge requests, but I
> will accept PRs from GitHub.
I really like the like gitlab for a variety of reason. I sure wish some
would put together a gitlab-meta ebuild for gentoo. I'd like to house
codes locally, and export relevant open source codes to a online
location, or distributed among a collective of gitlab-gentoo sites.
Complementary to github and our github-centric-dev community.
>
> The whitelist https://gitlab.com/Tatsh/gcrud/blob/master/whitelist.c is
> currently hard-coded and limited but the results are satisfactory for
> now in my use cases.
>
> Type use case:
>
> sudo ./gcrud | sort -u > out.log
>
> Examine out.log for things you can delete. There are absolutely zero
> calls to delete files from the machine in my code and never will be any
> kind of automation support.
That your choice and I respect that call. However (and it's a big
however), my gentoo-centric HPC cluster do need automated system
cleanup. So, initial, it's be an army of scripts, similar to your code,
that is mandatory in a "loosely coupled" heterogeneous clusters, not to
mention first-line security related cleanup.
> If anyone tries it out I certainly would like to see your output and get
> some bug reports or suggestions. The main feature planned is reading
> from a configuration file for exact file paths and regexs.
Yes, but, it'll be while for me. Offer and automated clean up option,
and I have dozens of systems to test.....
>
> --
> Andrew
Thank you Andrew for your work. It can also be very useful to my DAG
efforts for compiling, verifying, and clean up of cluster codes.
GLEP 64 was on the path to systematically solve what you you are doing
after the fact::
https://wiki.gentoo.org/wiki/GLEP:64
More refs for your convenience
http://asic-linux.com.mx/~izto/checkinstall/
http://gittup.org/tup/
("It will automatically clean-up old files.")
hth,
James
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-user] Replacement for gcruft: gcrud
2018-08-16 6:07 [gentoo-user] Replacement for gcruft: gcrud Andrew Udvare
2018-08-16 18:22 ` [gentoo-user] " james
@ 2018-08-16 20:09 ` Corentin “Nado” Pazdera
2018-08-16 23:09 ` Andrew Udvare
2018-08-17 12:15 ` Corentin “Nado” Pazdera
1 sibling, 2 replies; 6+ messages in thread
From: Corentin “Nado” Pazdera @ 2018-08-16 20:09 UTC (permalink / raw
To: gentoo-user
August 16, 2018 8:07 AM, "Andrew Udvare" <audvare@gmail.com> wrote:
> gcruft seems to have died off (https://www.google.com/search?q=gcruft
> only returns ebuild results). I was using it quite a lot and wrote many
> exception files. It's gone now with no way for my or anyone else's
> ebuild to get the original source. I did preserve it though, here:
> https://gitlab.com/Tatsh/gcruft
>
> I wrote a replacement in C named gcrud. It only needs GLib2 installed to
> work. It's much faster than gcruft ever was. The code is here:
>
> https://gitlab.com/Tatsh/gcrud
> https://github.com/Tatsh/gcrud
>
> I am placing preference in GitLab for issues and merge requests, but I
> will accept PRs from GitHub.
>
> The whitelist https://gitlab.com/Tatsh/gcrud/blob/master/whitelist.c is
> currently hard-coded and limited but the results are satisfactory for
> now in my use cases.
>
> Type use case:
>
> sudo ./gcrud | sort -u > out.log
>
> Examine out.log for things you can delete. There are absolutely zero
> calls to delete files from the machine in my code and never will be any
> kind of automation support.
>
> If anyone tries it out I certainly would like to see your output and get
> some bug reports or suggestions. The main feature planned is reading
> from a configuration file for exact file paths and regexs.
>
> --
> Andrew
Hi,
So I tested it, and I was surprised how many /etc files weren't put into whitelist.
Actually, most of /etc shouldn't be suggested for deletion if the packages are still installed.
Portage stuff like repositories could be whitelisted in a dynamic manner, or at least bing able to
tell what directorie(s) are used to store them.
I also caught some wrongly listed files because of the multilib system with /lib symlink.
For example, dhcpcd declared /lib/dhcpcd/dhcpcd-hooks, thus the realpath /lib64/dhcpcd/dhcpcd-hooks
was listed in the removal suggestion. This should be fixed with profile 17.1
The log is so huge at the moment it is useless for me :/
% wc -l out.log
461575 out.log
--
Corentin “Nado” Pazdera
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-user] Replacement for gcruft: gcrud
2018-08-16 20:09 ` Corentin “Nado” Pazdera
@ 2018-08-16 23:09 ` Andrew Udvare
2018-08-17 12:15 ` Corentin “Nado” Pazdera
1 sibling, 0 replies; 6+ messages in thread
From: Andrew Udvare @ 2018-08-16 23:09 UTC (permalink / raw
To: gentoo-user
> On 2018-08-16, at 16:09, Corentin “Nado” Pazdera <nado@troglodyte.be> wrote:
>
> Hi,
>
> So I tested it, and I was surprised how many /etc files weren't put into whitelist.
> Actually, most of /etc shouldn't be suggested for deletion if the packages are still installed.
Thanks for testing! Really appreciate it.
The whitelist is the biggest work in progress right now. Most of what it lists from /etc for me is /etc/config-archive which AFAIK is not managed by Portage at all although Portage will place old files there? I don't use the feature because my /etc is controlled by Git. The stuff listed in /var/ is pretty accurate as there's a lot of old website cruft and this computer does not serve anything like that anymore.
>
> Portage stuff like repositories could be whitelisted in a dynamic manner, or at least bing able to
> tell what directorie(s) are used to store them.
The idea is to move to everything in the whitelist.c file to a declarative (no code unless you count RE) configuration file. I have not decided on a format but I am leaning towards INI-style because GLib2 has a parser for that built-in. The config file will specify exact paths, RE, and globs. There will be a default dynamic list generated at runtime based on what packages you have installed (as gcruft had this feature).
> I also caught some wrongly listed files because of the multilib system with /lib symlink.
> For example, dhcpcd declared /lib/dhcpcd/dhcpcd-hooks, thus the realpath /lib64/dhcpcd/dhcpcd-hooks
> was listed in the removal suggestion. This should be fixed with profile 17.1
The /lib vs /lib64 issue will be resolved in a later version. I think I need to use lstat() everywhere instead of stat(), or I can call realpath() prior to storing values in the set. This file should be whitelisted, but only if you have dhcpcd installed (I've long since moved to dhcpd).
I am trying to my best to give zero false positives, so you plan to have something like `% gcrud | ... | xargs rm -fR`.
>
> The log is so huge at the moment it is useless for me :/
>
> % wc -l out.log
> 461575 out.log
Any thoughts on how to simplify analysis?
--
Andrew
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-user] Replacement for gcruft: gcrud
2018-08-16 18:22 ` [gentoo-user] " james
@ 2018-08-16 23:25 ` Andrew Udvare
0 siblings, 0 replies; 6+ messages in thread
From: Andrew Udvare @ 2018-08-16 23:25 UTC (permalink / raw
To: gentoo-user
> On 2018-08-16, at 14:22, james <garftd@verizon.net> wrote:
>
> Yes, but, it'll be while for me. Offer and automated clean up option,
> and I have dozens of systems to test.....
I'll figure out the kind of tests I want to run sometime soon.
>
>
> GLEP 64 was on the path to systematically solve what you you are doing
> after the fact::
>
> https://wiki.gentoo.org/wiki/GLEP:64
>
> More refs for your convenience
>
> http://asic-linux.com.mx/~izto/checkinstall/
>
> http://gittup.org/tup/
> ("It will automatically clean-up old files.")
Thanks for pointing these out.
It is really tempting to support macOS like tup does, although SIP and the restored snapshot on boot kind of makes it unnecessary. And also the idea of using a newly created FS to see changes is interesting.
A new GLEP to systematically delete extraneous files could be to restore a non-user generated snapshot on boot just like iOS/macOS, but the problem is that we don't always use the same filesystem or mount configurations. Another way would be to use xattr but again the issue is compatibility.
--
Andrew
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-user] Replacement for gcruft: gcrud
2018-08-16 20:09 ` Corentin “Nado” Pazdera
2018-08-16 23:09 ` Andrew Udvare
@ 2018-08-17 12:15 ` Corentin “Nado” Pazdera
1 sibling, 0 replies; 6+ messages in thread
From: Corentin “Nado” Pazdera @ 2018-08-17 12:15 UTC (permalink / raw
To: gentoo-user
August 17, 2018 1:09 AM, "Andrew Udvare" <audvare@gmail.com> wrote:
> The whitelist is the biggest work in progress right now. Most of what it lists from /etc for me is
> /etc/config-archive which AFAIK is not managed by Portage at all although Portage will place old
> files there? I don't use the feature because my /etc is controlled by Git. The stuff listed in
> /var/ is pretty accurate as there's a lot of old website cruft and this computer does not serve
> anything like that anymore.
Well, for example I use eselect-repository which puts repos in /var/dbr/repos, I put gentoo tree in
there as well and the whole tree is suggested for deletion.
A solution would be to read /etc/portage/repos.conf file(s) for repos location during the runtime
detection, or use portageq interface.
Or tell people to whitelist manually their repos location when the config file will be available ;)
You could add in whitelist directories containing a .keep file, although I'm not sure how to
specify it.
Same goes for git repositories, I’d rather delete a whole git repo or nothing at all inside, so
adding a rule which can interprets "pick parent dir of a .git dir to suggest deletion, ignore all
children of said parent".
> The idea is to move to everything in the whitelist.c file to a declarative (no code unless you
> count RE) configuration file. I have not decided on a format but I am leaning towards INI-style
> because GLib2 has a parser for that built-in. The config file will specify exact paths, RE, and
> globs. There will be a default dynamic list generated at runtime based on what packages you have
> installed (as gcruft had this feature).
That will be nice, waiting for it ;) Something basic might be enough for making batches of test
before choosing a definite format.
>> I also caught some wrongly listed files because of the multilib system with /lib symlink.
>> For example, dhcpcd declared /lib/dhcpcd/dhcpcd-hooks, thus the realpath /lib64/dhcpcd/dhcpcd-hooks
>> was listed in the removal suggestion. This should be fixed with profile 17.1
>
> The /lib vs /lib64 issue will be resolved in a later version. I think I need to use lstat()
> everywhere instead of stat(), or I can call realpath() prior to storing values in the set. This
> file should be whitelisted, but only if you have dhcpcd installed (I've long since moved to dhcpd).
I’m in favor of the realpath suggestion, this will be useful for any symlinked accessed path.
>> The log is so huge at the moment it is useless for me :/
>>
>> % wc -l out.log
>> 461575 out.log
>
> Any thoughts on how to simplify analysis?
A few, but I’m not sure if I have much which are /universal/ in gentoo systems.
Do you plan to integrate the sorting part in gcrud directly?
If so, I’d suggest bringing /usr/* stuff first to show, because un-owned files should be
exceptions.
Same goes for /lib, but stuff like kernel modules should be treated carefully, we can either
whitelist the whole /lib{,32,64}/modules, or try being smart and select old kernel modules only.
This might be tricky given the number of ways someone can manage them.
Also, here is small analysis of files locations by gcrud.
% cut -d/ -f2 out.log|uniq -c
295 etc
3309 lib64
1178 lib
13 opt
39586 usr
417194 var
/var containing my different repos, its logical it contains most occurences.
Next goes usr, containing another lib{,32,64} schema with /usr/lib pointing to /usr/lib64, with go
packages installed (in /usr/lib64/go).
With these informations, I suppose most will disappear when using realpath/switching to 17.1
profile.
Thanks for your work, this will probably a excellent tool in a few commits ;)
Regards,
Corentin “Nado” Pazdera
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-08-17 12:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-16 6:07 [gentoo-user] Replacement for gcruft: gcrud Andrew Udvare
2018-08-16 18:22 ` [gentoo-user] " james
2018-08-16 23:25 ` [gentoo-user] " Andrew Udvare
2018-08-16 20:09 ` Corentin “Nado” Pazdera
2018-08-16 23:09 ` Andrew Udvare
2018-08-17 12:15 ` Corentin “Nado” Pazdera
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox