* [gentoo-user] Replacement for gcruft: gcrud @ 2018-08-16 6:07 Andrew Udvare 2018-08-16 18:22 ` [gentoo-user] " james 2018-08-16 20:09 ` Corentin “Nado” Pazdera 0 siblings, 2 replies; 6+ messages in thread From: Andrew Udvare @ 2018-08-16 6:07 UTC (permalink / raw To: gentoo-user [-- Attachment #1.1: Type: text/plain, Size: 1241 bytes --] gcruft seems to have died off (https://www.google.com/search?q=gcruft only returns ebuild results). I was using it quite a lot and wrote many exception files. It's gone now with no way for my or anyone else's ebuild to get the original source. I did preserve it though, here: https://gitlab.com/Tatsh/gcruft I wrote a replacement in C named gcrud. It only needs GLib2 installed to work. It's much faster than gcruft ever was. The code is here: https://gitlab.com/Tatsh/gcrud https://github.com/Tatsh/gcrud I am placing preference in GitLab for issues and merge requests, but I will accept PRs from GitHub. The whitelist https://gitlab.com/Tatsh/gcrud/blob/master/whitelist.c is currently hard-coded and limited but the results are satisfactory for now in my use cases. Type use case: sudo ./gcrud | sort -u > out.log Examine out.log for things you can delete. There are absolutely zero calls to delete files from the machine in my code and never will be any kind of automation support. If anyone tries it out I certainly would like to see your output and get some bug reports or suggestions. The main feature planned is reading from a configuration file for exact file paths and regexs. -- Andrew [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* [gentoo-user] Re: Replacement for gcruft: gcrud 2018-08-16 6:07 [gentoo-user] Replacement for gcruft: gcrud Andrew Udvare @ 2018-08-16 18:22 ` james 2018-08-16 23:25 ` [gentoo-user] " Andrew Udvare 2018-08-16 20:09 ` Corentin “Nado” Pazdera 1 sibling, 1 reply; 6+ messages in thread From: james @ 2018-08-16 18:22 UTC (permalink / raw To: gentoo-user On 08/16/18 02:07, Andrew Udvare wrote: > gcruft seems to have died off (https://www.google.com/search?q=gcruft > only returns ebuild results). It might (not really sure) be active but it appears to still be around as ebuilds:: eix -R gcruft * app-portage/gcruft Available versions: ~0.1-r1^m[1] ~0.1-r1^m[2] ~0.1.1^m[1] ~0.1.1^m[2] Homepage: http://www.genoetigt.de/site/projects/gcruft Description: helps finding orphaned files on a gentoo system > I was using it quite a lot and wrote many > exception files. It's gone now with no way for my or anyone else's > ebuild to get the original source. I did preserve it though, here: > https://gitlab.com/Tatsh/gcruft Thanks for caring! > I wrote a replacement in C named gcrud. It only needs GLib2 installed to > work. It's much faster than gcruft ever was. The code is here: > > https://gitlab.com/Tatsh/gcrud > https://github.com/Tatsh/gcrud It's going to take me a while to get aroud to testing, but I really really like admin codes in "C" so it is automatically on my short list.... > > I am placing preference in GitLab for issues and merge requests, but I > will accept PRs from GitHub. I really like the like gitlab for a variety of reason. I sure wish some would put together a gitlab-meta ebuild for gentoo. I'd like to house codes locally, and export relevant open source codes to a online location, or distributed among a collective of gitlab-gentoo sites. Complementary to github and our github-centric-dev community. > > The whitelist https://gitlab.com/Tatsh/gcrud/blob/master/whitelist.c is > currently hard-coded and limited but the results are satisfactory for > now in my use cases. > > Type use case: > > sudo ./gcrud | sort -u > out.log > > Examine out.log for things you can delete. There are absolutely zero > calls to delete files from the machine in my code and never will be any > kind of automation support. That your choice and I respect that call. However (and it's a big however), my gentoo-centric HPC cluster do need automated system cleanup. So, initial, it's be an army of scripts, similar to your code, that is mandatory in a "loosely coupled" heterogeneous clusters, not to mention first-line security related cleanup. > If anyone tries it out I certainly would like to see your output and get > some bug reports or suggestions. The main feature planned is reading > from a configuration file for exact file paths and regexs. Yes, but, it'll be while for me. Offer and automated clean up option, and I have dozens of systems to test..... > > -- > Andrew Thank you Andrew for your work. It can also be very useful to my DAG efforts for compiling, verifying, and clean up of cluster codes. GLEP 64 was on the path to systematically solve what you you are doing after the fact:: https://wiki.gentoo.org/wiki/GLEP:64 More refs for your convenience http://asic-linux.com.mx/~izto/checkinstall/ http://gittup.org/tup/ ("It will automatically clean-up old files.") hth, James ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-user] Replacement for gcruft: gcrud 2018-08-16 18:22 ` [gentoo-user] " james @ 2018-08-16 23:25 ` Andrew Udvare 0 siblings, 0 replies; 6+ messages in thread From: Andrew Udvare @ 2018-08-16 23:25 UTC (permalink / raw To: gentoo-user > On 2018-08-16, at 14:22, james <garftd@verizon.net> wrote: > > Yes, but, it'll be while for me. Offer and automated clean up option, > and I have dozens of systems to test..... I'll figure out the kind of tests I want to run sometime soon. > > > GLEP 64 was on the path to systematically solve what you you are doing > after the fact:: > > https://wiki.gentoo.org/wiki/GLEP:64 > > More refs for your convenience > > http://asic-linux.com.mx/~izto/checkinstall/ > > http://gittup.org/tup/ > ("It will automatically clean-up old files.") Thanks for pointing these out. It is really tempting to support macOS like tup does, although SIP and the restored snapshot on boot kind of makes it unnecessary. And also the idea of using a newly created FS to see changes is interesting. A new GLEP to systematically delete extraneous files could be to restore a non-user generated snapshot on boot just like iOS/macOS, but the problem is that we don't always use the same filesystem or mount configurations. Another way would be to use xattr but again the issue is compatibility. -- Andrew ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-user] Replacement for gcruft: gcrud 2018-08-16 6:07 [gentoo-user] Replacement for gcruft: gcrud Andrew Udvare 2018-08-16 18:22 ` [gentoo-user] " james @ 2018-08-16 20:09 ` Corentin “Nado” Pazdera 2018-08-16 23:09 ` Andrew Udvare 2018-08-17 12:15 ` Corentin “Nado” Pazdera 1 sibling, 2 replies; 6+ messages in thread From: Corentin “Nado” Pazdera @ 2018-08-16 20:09 UTC (permalink / raw To: gentoo-user August 16, 2018 8:07 AM, "Andrew Udvare" <audvare@gmail.com> wrote: > gcruft seems to have died off (https://www.google.com/search?q=gcruft > only returns ebuild results). I was using it quite a lot and wrote many > exception files. It's gone now with no way for my or anyone else's > ebuild to get the original source. I did preserve it though, here: > https://gitlab.com/Tatsh/gcruft > > I wrote a replacement in C named gcrud. It only needs GLib2 installed to > work. It's much faster than gcruft ever was. The code is here: > > https://gitlab.com/Tatsh/gcrud > https://github.com/Tatsh/gcrud > > I am placing preference in GitLab for issues and merge requests, but I > will accept PRs from GitHub. > > The whitelist https://gitlab.com/Tatsh/gcrud/blob/master/whitelist.c is > currently hard-coded and limited but the results are satisfactory for > now in my use cases. > > Type use case: > > sudo ./gcrud | sort -u > out.log > > Examine out.log for things you can delete. There are absolutely zero > calls to delete files from the machine in my code and never will be any > kind of automation support. > > If anyone tries it out I certainly would like to see your output and get > some bug reports or suggestions. The main feature planned is reading > from a configuration file for exact file paths and regexs. > > -- > Andrew Hi, So I tested it, and I was surprised how many /etc files weren't put into whitelist. Actually, most of /etc shouldn't be suggested for deletion if the packages are still installed. Portage stuff like repositories could be whitelisted in a dynamic manner, or at least bing able to tell what directorie(s) are used to store them. I also caught some wrongly listed files because of the multilib system with /lib symlink. For example, dhcpcd declared /lib/dhcpcd/dhcpcd-hooks, thus the realpath /lib64/dhcpcd/dhcpcd-hooks was listed in the removal suggestion. This should be fixed with profile 17.1 The log is so huge at the moment it is useless for me :/ % wc -l out.log 461575 out.log -- Corentin “Nado” Pazdera ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-user] Replacement for gcruft: gcrud 2018-08-16 20:09 ` Corentin “Nado” Pazdera @ 2018-08-16 23:09 ` Andrew Udvare 2018-08-17 12:15 ` Corentin “Nado” Pazdera 1 sibling, 0 replies; 6+ messages in thread From: Andrew Udvare @ 2018-08-16 23:09 UTC (permalink / raw To: gentoo-user > On 2018-08-16, at 16:09, Corentin “Nado” Pazdera <nado@troglodyte.be> wrote: > > Hi, > > So I tested it, and I was surprised how many /etc files weren't put into whitelist. > Actually, most of /etc shouldn't be suggested for deletion if the packages are still installed. Thanks for testing! Really appreciate it. The whitelist is the biggest work in progress right now. Most of what it lists from /etc for me is /etc/config-archive which AFAIK is not managed by Portage at all although Portage will place old files there? I don't use the feature because my /etc is controlled by Git. The stuff listed in /var/ is pretty accurate as there's a lot of old website cruft and this computer does not serve anything like that anymore. > > Portage stuff like repositories could be whitelisted in a dynamic manner, or at least bing able to > tell what directorie(s) are used to store them. The idea is to move to everything in the whitelist.c file to a declarative (no code unless you count RE) configuration file. I have not decided on a format but I am leaning towards INI-style because GLib2 has a parser for that built-in. The config file will specify exact paths, RE, and globs. There will be a default dynamic list generated at runtime based on what packages you have installed (as gcruft had this feature). > I also caught some wrongly listed files because of the multilib system with /lib symlink. > For example, dhcpcd declared /lib/dhcpcd/dhcpcd-hooks, thus the realpath /lib64/dhcpcd/dhcpcd-hooks > was listed in the removal suggestion. This should be fixed with profile 17.1 The /lib vs /lib64 issue will be resolved in a later version. I think I need to use lstat() everywhere instead of stat(), or I can call realpath() prior to storing values in the set. This file should be whitelisted, but only if you have dhcpcd installed (I've long since moved to dhcpd). I am trying to my best to give zero false positives, so you plan to have something like `% gcrud | ... | xargs rm -fR`. > > The log is so huge at the moment it is useless for me :/ > > % wc -l out.log > 461575 out.log Any thoughts on how to simplify analysis? -- Andrew ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-user] Replacement for gcruft: gcrud 2018-08-16 20:09 ` Corentin “Nado” Pazdera 2018-08-16 23:09 ` Andrew Udvare @ 2018-08-17 12:15 ` Corentin “Nado” Pazdera 1 sibling, 0 replies; 6+ messages in thread From: Corentin “Nado” Pazdera @ 2018-08-17 12:15 UTC (permalink / raw To: gentoo-user August 17, 2018 1:09 AM, "Andrew Udvare" <audvare@gmail.com> wrote: > The whitelist is the biggest work in progress right now. Most of what it lists from /etc for me is > /etc/config-archive which AFAIK is not managed by Portage at all although Portage will place old > files there? I don't use the feature because my /etc is controlled by Git. The stuff listed in > /var/ is pretty accurate as there's a lot of old website cruft and this computer does not serve > anything like that anymore. Well, for example I use eselect-repository which puts repos in /var/dbr/repos, I put gentoo tree in there as well and the whole tree is suggested for deletion. A solution would be to read /etc/portage/repos.conf file(s) for repos location during the runtime detection, or use portageq interface. Or tell people to whitelist manually their repos location when the config file will be available ;) You could add in whitelist directories containing a .keep file, although I'm not sure how to specify it. Same goes for git repositories, I’d rather delete a whole git repo or nothing at all inside, so adding a rule which can interprets "pick parent dir of a .git dir to suggest deletion, ignore all children of said parent". > The idea is to move to everything in the whitelist.c file to a declarative (no code unless you > count RE) configuration file. I have not decided on a format but I am leaning towards INI-style > because GLib2 has a parser for that built-in. The config file will specify exact paths, RE, and > globs. There will be a default dynamic list generated at runtime based on what packages you have > installed (as gcruft had this feature). That will be nice, waiting for it ;) Something basic might be enough for making batches of test before choosing a definite format. >> I also caught some wrongly listed files because of the multilib system with /lib symlink. >> For example, dhcpcd declared /lib/dhcpcd/dhcpcd-hooks, thus the realpath /lib64/dhcpcd/dhcpcd-hooks >> was listed in the removal suggestion. This should be fixed with profile 17.1 > > The /lib vs /lib64 issue will be resolved in a later version. I think I need to use lstat() > everywhere instead of stat(), or I can call realpath() prior to storing values in the set. This > file should be whitelisted, but only if you have dhcpcd installed (I've long since moved to dhcpd). I’m in favor of the realpath suggestion, this will be useful for any symlinked accessed path. >> The log is so huge at the moment it is useless for me :/ >> >> % wc -l out.log >> 461575 out.log > > Any thoughts on how to simplify analysis? A few, but I’m not sure if I have much which are /universal/ in gentoo systems. Do you plan to integrate the sorting part in gcrud directly? If so, I’d suggest bringing /usr/* stuff first to show, because un-owned files should be exceptions. Same goes for /lib, but stuff like kernel modules should be treated carefully, we can either whitelist the whole /lib{,32,64}/modules, or try being smart and select old kernel modules only. This might be tricky given the number of ways someone can manage them. Also, here is small analysis of files locations by gcrud. % cut -d/ -f2 out.log|uniq -c 295 etc 3309 lib64 1178 lib 13 opt 39586 usr 417194 var /var containing my different repos, its logical it contains most occurences. Next goes usr, containing another lib{,32,64} schema with /usr/lib pointing to /usr/lib64, with go packages installed (in /usr/lib64/go). With these informations, I suppose most will disappear when using realpath/switching to 17.1 profile. Thanks for your work, this will probably a excellent tool in a few commits ;) Regards, Corentin “Nado” Pazdera ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-08-17 12:16 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-08-16 6:07 [gentoo-user] Replacement for gcruft: gcrud Andrew Udvare 2018-08-16 18:22 ` [gentoo-user] " james 2018-08-16 23:25 ` [gentoo-user] " Andrew Udvare 2018-08-16 20:09 ` Corentin “Nado” Pazdera 2018-08-16 23:09 ` Andrew Udvare 2018-08-17 12:15 ` Corentin “Nado” Pazdera
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox