From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id 0897313877A for ; Thu, 7 Aug 2014 17:20:19 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 0BDF6E0888; Thu, 7 Aug 2014 17:20:16 +0000 (UTC) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 4BAEDE0867 for ; Thu, 7 Aug 2014 17:20:15 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1XFRMJ-0006U7-Ke for gentoo-amd64@lists.gentoo.org; Thu, 07 Aug 2014 19:20:11 +0200 Received: from ip68-231-22-224.ph.ph.cox.net ([68.231.22.224]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 07 Aug 2014 19:20:11 +0200 Received: from 1i5t5.duncan by ip68-231-22-224.ph.ph.cox.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 07 Aug 2014 19:20:11 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: gentoo-amd64@lists.gentoo.org From: Duncan <1i5t5.duncan@cox.net> Subject: [gentoo-amd64] Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) Date: Thu, 7 Aug 2014 17:20:00 +0000 (UTC) Message-ID: References: <46751df7496f4e4f97fb23e10fc9f5b4@mail10.futurewins.com> <53E39D0E.5020808@maxandcarrie.com> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-amd64@lists.gentoo.org Reply-to: gentoo-amd64@lists.gentoo.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: ip68-231-22-224.ph.ph.cox.net User-Agent: Pan/0.140 (Chocolate Salty Balls; GIT d447f7c /m/p/portage/src/egit-src/pan2) X-Archives-Salt: 813c9fa7-2312-4c7c-bf39-f113198e12e0 X-Archives-Hash: 9f620bc4269bc69c2737a8cef1e7187a Lie Ryan posted on Fri, 08 Aug 2014 02:06:14 +1000 as excerpted: > With you having to compile thousands of stuffs if you build from stage > 1, I doubt that you will be able to verify every single thing you > compile and detect if something is actually doing sneaky stuff AND still > have the time to enjoy your system. Also, even if you build from stage 1 > and manage to verify all the source code, you still need to download a > precompiled compiler which could possibly inject the malicious code into > the programs it compiles, and which can also inject itself if you try to > compile another compiler from source. If there is a single software that > is worth a gold mine to inject with malware to gain illicit access to > all Linux system, then it would be gcc. Once you infect a compiler, > you're invincible. Actually, that brings up a good question. The art of compiling is certainly somewhat magic to me tho I guess I somewhat understand the concept in a vague, handwavy way, but... >From my understanding, that's one reason why the gcc build is multi-stage and uses simpler (and thus easier to audit) tools such as lex and bison in its bootstrapping process. I'm not actually sure whether gcc actually requires a previous gcc (or other full compiler) to build or not, but I do know it goes to quite some lengths to bootstrap in multiple stages, building things up from the simple to the complex as it goes and testing each stage in the process so that if something goes wrong, there's some idea /where/ it went wrong. Clearly one major reason for that is proving functionality at each step such that if the process goes wrong, there's some place to start as to why and how, but it certainly doesn't hurt in helping to prove or at least somewhat establish the basic security situation either, tho as we've already established, it's basically impossible to prove both the hardware and the software back thru all the multiple generations. Of course the simpler tools, lex, bison, etc, must have been built from something, but because they /are/ simpler, they're also easier to audit and prove basic functionality, including disassembly and analysis of individual machine instructions for a fuller audit. So anyway, to the gcc experts that know, and to non-gcc CS folks who have actually built their own simple compilers and can at least address the concept, is a previous gcc or other full compiler actually required to build a new gcc, or does it sufficiently bootstrap itself from the more basic tools such that unlike most code, it doesn't actually need a full compiler to build and reasonably optimize at all? That's a question I've had brewing in the back of my mind for some time, and this seemed the perfect opportunity to ask it. =:^) Meanwhile, I suppose it must be possible at least at some level, else how would new hardware archs come to be supported. Gotta start /somewhere/ on the toolchain, and "simpler" stuff like lex and bison can I believe run on a previous arch, generating the basic executable building blocks that ultimately become the first executable code actually run by the new target arch. And of course gcc has long been one of the most widely arch-supporting compilers, precisely because it /is/ open source and /is/ designed to be bootstrapped in stages like that. I guess clang/llvm is giving gcc some competition in that area now, in part because it's more modern and modular and in part because unlike gcc it /can/ legally be taken private and supplied to others without offering sources and some companies are evil that way, but gcc's the one with the long history in that area, and given that history I'd guess it'll be some time before clang/llvm catches up, even if it's getting most of the new platforms right now, which I've no idea whether it's the case or not. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman