From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id B596A138334 for ; Fri, 17 Aug 2018 12:16:10 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 5DF84E0930; Fri, 17 Aug 2018 12:16:01 +0000 (UTC) Received: from Sindri.troglodyte.be (sindri.troglodyte.be [51.15.219.152]) by pigeon.gentoo.org (Postfix) with ESMTP id C499EE0894 for ; Fri, 17 Aug 2018 12:16:00 +0000 (UTC) Received: from cloud.troglodyte.be (localhost [127.0.0.1]) by Sindri.troglodyte.be (Postfix) with ESMTPSA id BC89D800B3 for ; Fri, 17 Aug 2018 14:15:38 +0200 (CEST) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 Date: Fri, 17 Aug 2018 12:15:38 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: RainLoop/1.12.0 From: "=?utf-8?B?Q29yZW50aW4g4oCcTmFkb+KAnSBQYXpkZXJh?=" Message-ID: Subject: Re: [gentoo-user] Replacement for gcruft: gcrud To: gentoo-user@lists.gentoo.org In-Reply-To: <4DD980A7-F048-4526-B188-778A03D7719F@gmail.com> References: <4DD980A7-F048-4526-B188-778A03D7719F@gmail.com> <32db6b65-f082-9189-8a4d-047005b980f9@gmail.com> X-Archives-Salt: a2ad2f3a-68e8-4c6f-9c33-e9775fb96fd4 X-Archives-Hash: 047bba4357457dda43f225f2beb1555b August 17, 2018 1:09 AM, "Andrew Udvare" wrote:=0A=0A= > The whitelist is the biggest work in progress right now. Most of what i= t lists from /etc for me is=0A> /etc/config-archive which AFAIK is not ma= naged by Portage at all although Portage will place old=0A> files there? = I don't use the feature because my /etc is controlled by Git. The stuff l= isted in=0A> /var/ is pretty accurate as there's a lot of old website cru= ft and this computer does not serve=0A> anything like that anymore.=0A=0A= Well, for example I use eselect-repository which puts repos in /var/dbr/r= epos, I put gentoo tree in=0Athere as well and the whole tree is suggeste= d for deletion.=0AA solution would be to read /etc/portage/repos.conf fil= e(s) for repos location during the runtime=0Adetection, or use portageq i= nterface.=0AOr tell people to whitelist manually their repos location whe= n the config file will be available ;)=0A=0AYou could add in whitelist di= rectories containing a .keep file, although I'm not sure how to=0Aspecify= it.=0ASame goes for git repositories, I=E2=80=99d rather delete a whole = git repo or nothing at all inside, so=0Aadding a rule which can interpret= s "pick parent dir of a .git dir to suggest deletion, ignore all=0Achildr= en of said parent".=0A=0A> The idea is to move to everything in the white= list.c file to a declarative (no code unless you=0A> count RE) configurat= ion file. I have not decided on a format but I am leaning towards INI-sty= le=0A> because GLib2 has a parser for that built-in. The config file will= specify exact paths, RE, and=0A> globs. There will be a default dynamic = list generated at runtime based on what packages you have=0A> installed (= as gcruft had this feature).=0A=0AThat will be nice, waiting for it ;) So= mething basic might be enough for making batches of test=0Abefore choosin= g a definite format.=0A=0A>> I also caught some wrongly listed files beca= use of the multilib system with /lib symlink.=0A>> For example, dhcpcd de= clared /lib/dhcpcd/dhcpcd-hooks, thus the realpath /lib64/dhcpcd/dhcpcd-h= ooks=0A>> was listed in the removal suggestion. This should be fixed with= profile 17.1=0A> =0A> The /lib vs /lib64 issue will be resolved in a lat= er version. I think I need to use lstat()=0A> everywhere instead of stat(= ), or I can call realpath() prior to storing values in the set. This=0A> = file should be whitelisted, but only if you have dhcpcd installed (I've l= ong since moved to dhcpd).=0A=0AI=E2=80=99m in favor of the realpath sugg= estion, this will be useful for any symlinked accessed path.=0A=0A>> The = log is so huge at the moment it is useless for me :/=0A>> =0A>> % wc -l o= ut.log=0A>> 461575 out.log=0A> =0A> Any thoughts on how to simplify analy= sis?=0A=0AA few, but I=E2=80=99m not sure if I have much which are /unive= rsal/ in gentoo systems.=0ADo you plan to integrate the sorting part in g= crud directly?=0AIf so, I=E2=80=99d suggest bringing /usr/* stuff first t= o show, because un-owned files should be=0Aexceptions.=0ASame goes for /l= ib, but stuff like kernel modules should be treated carefully, we can eit= her=0Awhitelist the whole /lib{,32,64}/modules, or try being smart and se= lect old kernel modules only.=0AThis might be tricky given the number of = ways someone can manage them.=0A=0AAlso, here is small analysis of files = locations by gcrud.=0A=0A% cut -d/ -f2 out.log|uniq -c=0A295 etc=0A3309 l= ib64=0A1178 lib=0A13 opt=0A39586 usr=0A417194 var=0A=0A/var containing my= different repos, its logical it contains most occurences.=0ANext goes us= r, containing another lib{,32,64} schema with /usr/lib pointing to /usr/l= ib64, with go=0Apackages installed (in /usr/lib64/go).=0AWith these infor= mations, I suppose most will disappear when using realpath/switching to 1= 7.1=0Aprofile.=0A=0AThanks for your work, this will probably a excellent = tool in a few commits ;)=0A=0ARegards,=0ACorentin =E2=80=9CNado=E2=80=9D = Pazdera