public inbox for gentoo-soc@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-soc] Project Proposal : GenCC for "Gentoo Community Compiling"
@ 2013-02-18 13:29 Antoine Pinsard
  2013-02-18 14:03 ` Patrick Lauer
  2013-02-19  1:20 ` Luca Barbato
  0 siblings, 2 replies; 11+ messages in thread
From: Antoine Pinsard @ 2013-02-18 13:29 UTC (permalink / raw
  To: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 2155 bytes --]

Hello Gentoo!

My name is Antoine Pinsard, I am a 21 years old French student in
computer science.

I made my first steps in the GNU/Linux community in 2008 starting with
Ubuntu. Two years later I migrated to Debian and two more years later I
went to Gentoo. Today I am running Gentoo on my laptop and trying Funtoo
on my desktop computer.

My first attraction to the IT world was programming and especially for
the web. But since I am using GNU/Linux, I am more and more interested
in system administration, software/hardware optimization and networking
(including security).

Since last year I'm trying to get involved in the open-source community.
Actually I didn't do many things for now: I subscribed an FSF student
membership ; I reported some bugs on some free software ; And a few days
ago I wrote a tool to manage use flags (which is currently discussed in
the Gentoo Chat forum).

I have heard about the Google Summer of Code last week for the first
time and I think this is a chance for me to both learn how to "work
open-source" and get an interesting job this summer.

So, as no project was submitted yet, I'd like to offer mine. This is an
idea I just had so I didn't go further yet. I would like to have your
opinion first.

The project is to make a tool to ease and encourage cross compiling
between Gentoo users. Basically, there would be two programs:

 * genccd, a daemon that any Gentoo (or derived) root user could start
and which wait for external requests to process a compilation.
 * gencc, a tool that looks for the nearest computers running genccd and
ask them to process a part of the job required by the command passed as
parameter. (e.g `gencc emerge -uDN --with-bdeps=y @world`).

This is a very basic approach of the tool but I think it gives the main
idea of the projet. I would like to have your opinion on whether it
could be a gsoc project or not. And if it could, what backgrounds it
would require. I think this is much more about networking and security
than compiling (though it would require at least a basic knowledge of
distcc).

Thanks in advance,

Antoine Pinsard

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-soc] Project Proposal : GenCC for "Gentoo Community Compiling"
  2013-02-18 13:29 [gentoo-soc] Project Proposal : GenCC for "Gentoo Community Compiling" Antoine Pinsard
@ 2013-02-18 14:03 ` Patrick Lauer
  2013-02-18 14:41   ` Antoine Pinsard
  2013-02-18 19:13   ` Rich Freeman
  2013-02-19  1:20 ` Luca Barbato
  1 sibling, 2 replies; 11+ messages in thread
From: Patrick Lauer @ 2013-02-18 14:03 UTC (permalink / raw
  To: gentoo-soc; +Cc: Antoine Pinsard

On 02/18/2013 09:29 PM, Antoine Pinsard wrote:
[snip]
> The project is to make a tool to ease and encourage cross compiling
> between Gentoo users. Basically, there would be two programs:
> 
>  * genccd, a daemon that any Gentoo (or derived) root user could start
> and which wait for external requests to process a compilation.
>  * gencc, a tool that looks for the nearest computers running genccd and
> ask them to process a part of the job required by the command passed as
> parameter. (e.g `gencc emerge -uDN --with-bdeps=y @world`).

So, how do you handle people being evil? What happens if my genccd
always returns 0-byte files? what if it adds random things to execute
code on the user's system?

How do you manage to get a precise environment (including useflags,
correct gcc version, ...) onto the compile host? Wouldn't that be slower
than building it locally?

> This is a very basic approach of the tool but I think it gives the main
> idea of the projet. I would like to have your opinion on whether it
> could be a gsoc project or not. And if it could, what backgrounds it
> would require. I think this is much more about networking and security
> than compiling (though it would require at least a basic knowledge of
> distcc).

It's an idea that comes up every year, but is usually shot down as being
impractical or having serious security flaws.

> 
> Thanks in advance,
> 
> Antoine Pinsard
> 
Thanks for your interest, and maybe we can figure out something that
works this year :)

Patrick


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-soc] Project Proposal : GenCC for "Gentoo Community Compiling"
  2013-02-18 14:03 ` Patrick Lauer
@ 2013-02-18 14:41   ` Antoine Pinsard
  2013-02-18 19:13   ` Rich Freeman
  1 sibling, 0 replies; 11+ messages in thread
From: Antoine Pinsard @ 2013-02-18 14:41 UTC (permalink / raw
  To: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 744 bytes --]

On Mon, 2013-02-18 at 22:03 +0800, Patrick Lauer wrote:
> So, how do you handle people being evil? What happens if my genccd
> always returns 0-byte files? what if it adds random things to execute
> code on the user's system?
> 
> How do you manage to get a precise environment (including useflags,
> correct gcc version, ...) onto the compile host? Wouldn't that be slower
> than building it locally?
> 

I thought about these kind of issue but not about solutions yet. I give
me a few days to think about it and maybe find a beginning of solution.

I know this kind of cross-computing is more and more done in research
and development. So maybe I can figure out how they deal with these
issues.

Regards,

Antoine Pinsard

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-soc] Project Proposal : GenCC for "Gentoo Community Compiling"
  2013-02-18 14:03 ` Patrick Lauer
  2013-02-18 14:41   ` Antoine Pinsard
@ 2013-02-18 19:13   ` Rich Freeman
  2013-02-18 22:52     ` Antoine Pinsard
  1 sibling, 1 reply; 11+ messages in thread
From: Rich Freeman @ 2013-02-18 19:13 UTC (permalink / raw
  To: gentoo-soc; +Cc: Antoine Pinsard

On Mon, Feb 18, 2013 at 9:03 AM, Patrick Lauer <patrick@gentoo.org> wrote:

(note, switching order I comment in)

(Note also - I'm not sold on any of this being practical - this is an
outline of how the obstacles might be surmounted.)

>
> How do you manage to get a precise environment (including useflags,
> correct gcc version, ...) onto the compile host? Wouldn't that be slower
> than building it locally?

So, the premise is that if you build a package with identical
dependencies you'll get an identical output.  That is using a
definition of dependency as anything involved in the creation of the
package.

So, for this to work you'd need to:
1.  ID all the dependencies of a package (INCLUDING system packages).
We don't do this for various reasons, and that would be a problem,
unless the data is somehow gathered by the tool or we change our
policies.

2.  ID the versions and configurations of all the dependencies.  That
would include USE flags at least.  Hopefully nothing build-time
depends on configuration files, but that might also be an issue.

3.  ID hosts who have identical configurations, and use them to build.

However, if you're going to do all of that stuff, you could just as
easily assemble a binary packages catalog.  Why distribute all the
compiling when you can just ask for what should be an identical final
executable?

> So, how do you handle people being evil? What happens if my genccd
> always returns 0-byte files? what if it adds random things to execute
> code on the user's system?

I'd also want to make sure that the attacks can't go the other way as
well.  If nothing else you're open to a CPU-using DOS (compile this
2GB file full of really complex C++ abstractions for me).

For the output you could have some kind of web-of-trust where multiple
people do the work and report the hash/etc.

Again, a binary package repository shouldn't really be any harder to
implement and makes far more sense all around.  Then everybody just
reports their hashes, or we could have an official list of these and
mirrors/etc.

Distributed CC within an organization makes more sense as it
eliminates the trust issues, mostly (you might still care about
reliability, but not malicious attacks).  However, a single
organization is also likely to have more uniform configuration, and
thus a binary repository makes even more sense.

I think the big question is - what does distributed compilation get us
that a binary repository doesn't get us, or in what way is it easier?
I can't really think of any advantages of the former over the latter.
Both will still be a challenge to implement, and that might start with
fully identifying dependencies.

Rich


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-soc] Project Proposal : GenCC for "Gentoo Community Compiling"
  2013-02-18 19:13   ` Rich Freeman
@ 2013-02-18 22:52     ` Antoine Pinsard
  2013-02-18 23:27       ` Rich Freeman
  0 siblings, 1 reply; 11+ messages in thread
From: Antoine Pinsard @ 2013-02-18 22:52 UTC (permalink / raw
  To: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 1185 bytes --]

Thanks for these precisions.

Binary repository seems to be a radical but good solution to get rid of
attack attempts (in both ways). However (though I'm sure you're aware of
that), compilation possibilities are much too numerous to consider
applying it. And that's Gentoo's reason of beeing. Well, I think.

After some more searching, it comes obvious that, while it would be a
very interesting project, I clearly don't have the skills to carry it
out. That's why I will let researchers do their job and find another
interesting project.

Nonetheless, I think I found a solution that would theoretically work:
using kind of a "Web of Trust" (as do OpenPGP).

On Mon, 2013-02-18 at 14:13 -0500, Rich Freeman wrote:
> I think the big question is - what does distributed compilation get us
> that a binary repository doesn't get us, or in what way is it easier?
> I can't really think of any advantages of the former over the latter.

I don't agree with you on that point. Though distributed compilation
doesn't seem to be applicable yet, I don't think a binary repository
will ever be applicable (for the reason I stated formerly).

Regards,

Antoine Pinsard

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-soc] Project Proposal : GenCC for "Gentoo Community Compiling"
  2013-02-18 22:52     ` Antoine Pinsard
@ 2013-02-18 23:27       ` Rich Freeman
  2013-02-19  0:55         ` Antoine Pinsard
  0 siblings, 1 reply; 11+ messages in thread
From: Rich Freeman @ 2013-02-18 23:27 UTC (permalink / raw
  To: gentoo-soc

On Mon, Feb 18, 2013 at 5:52 PM, Antoine Pinsard
<antoine.pinsard@dichotomies.fr> wrote:
>
> Binary repository seems to be a radical but good solution to get rid of
> attack attempts (in both ways). However (though I'm sure you're aware of
> that), compilation possibilities are much too numerous to consider
> applying it. And that's Gentoo's reason of beeing. Well, I think.

Well, I'm not convinced it is practical either, but I don't see how
the "compilation possibilities" are any less complex for an ad-hoc
distributed compiler project.

Gentoo's reason for being isn't so that we can compile stuff every
time we solve it.  That is just a means to an end.  The reason for
being is so that we have a much higher level of control over what ends
up getting installed.  If users could get the same binary they would
have gotten by compiling things themselves without actually having to
compile it, I doubt anybody would miss the build time.

My basic point is that if you manage to solve the problems that
prevent an ad-hoc distribute compiler infrastructure from working,
chances are that you could just build a library of binary packages
with just as many supported options/permutations.

Rich


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-soc] Project Proposal : GenCC for "Gentoo Community Compiling"
  2013-02-18 23:27       ` Rich Freeman
@ 2013-02-19  0:55         ` Antoine Pinsard
  0 siblings, 0 replies; 11+ messages in thread
From: Antoine Pinsard @ 2013-02-19  0:55 UTC (permalink / raw
  To: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 1816 bytes --]

On Mon, 2013-02-18 at 18:27 -0500, Rich Freeman wrote:
> Gentoo's reason for being isn't so that we can compile stuff every
> time we solve it.  That is just a means to an end.  The reason for
> being is so that we have a much higher level of control over what ends
> up getting installed.  If users could get the same binary they would
> have gotten by compiling things themselves without actually having to
> compile it, I doubt anybody would miss the build time.

That's what I meant but it wasn't well expressed, you're right.

> My basic point is that if you manage to solve the problems that
> prevent an ad-hoc distribute compiler infrastructure from working,
> chances are that you could just build a library of binary packages
> with just as many supported options/permutations.

Tell me if I'm wrong but when I look at the gcc man I can see a huge set
of options (even if you only take optimization and machine dependent
options). Building one package with all combination already seems
unthinkable to me. I just can't imagine building a whole repository that
way. Moreover most of the built packages wouldn't be ever downloaded.
Whereas ad-hoc compiling just need to be run on the same architecture.

Yet I agree with you when you say "Why distribute all the
compiling when you can just ask for what should be an identical final
executable?". Maybe a combination of both could be good. The client just
ask a server if package X with USE flags "a -b c" and gcc options
"--foo=b -a -r" have already been compiled. If yes, the client just have
to download the binary. If no, the server (which needs to be trusted)
does the job and stores the result for the next requests. Which joins
the single organization idea you stated in your first message.

Regards,

Antoine Pinsard

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-soc] Project Proposal : GenCC for "Gentoo Community Compiling"
  2013-02-18 13:29 [gentoo-soc] Project Proposal : GenCC for "Gentoo Community Compiling" Antoine Pinsard
  2013-02-18 14:03 ` Patrick Lauer
@ 2013-02-19  1:20 ` Luca Barbato
  2013-02-19 21:10   ` Antoine Pinsard
  2013-04-01 14:43   ` Donnie Berkholz
  1 sibling, 2 replies; 11+ messages in thread
From: Luca Barbato @ 2013-02-19  1:20 UTC (permalink / raw
  To: gentoo-soc

On 18/02/13 14:29, Antoine Pinsard wrote:
> This is a very basic approach of the tool but I think it gives the main
> idea of the projet. I would like to have your opinion on whether it
> could be a gsoc project or not. And if it could, what backgrounds it
> would require. I think this is much more about networking and security
> than compiling (though it would require at least a basic knowledge of
> distcc).

You should study what had been done in the past (distcc, icecream etc)
and figure out what they are lacking and why nobody is using them on a
geographic network.

Then you have the problem of building a ring of trust strong enough.

And eventually you have to come to term with how compilers behave
differently depending on a number of situations.

Looks a quite good research project but I warn you not to expect quick
or easy results.

lu


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-soc] Project Proposal : GenCC for "Gentoo Community Compiling"
  2013-02-19  1:20 ` Luca Barbato
@ 2013-02-19 21:10   ` Antoine Pinsard
  2013-04-01 14:43   ` Donnie Berkholz
  1 sibling, 0 replies; 11+ messages in thread
From: Antoine Pinsard @ 2013-02-19 21:10 UTC (permalink / raw
  To: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 1324 bytes --]

On Tue, 2013-02-19 at 02:20 +0100, Luca Barbato wrote:
> You should study what had been done in the past (distcc, icecream etc)
> and figure out what they are lacking and why nobody is using them on a
> geographic network.
> 
> Then you have the problem of building a ring of trust strong enough.
> 
> And eventually you have to come to term with how compilers behave
> differently depending on a number of situations.
> 
> Looks a quite good research project but I warn you not to expect quick
> or easy results.
> 
> lu
> 

(Please excuse in advance my English expression that might not always
fit exactly what I mean to say, I'm actively trying to improve it.)

I think that, from the previous messages and my few searches on the
Internet, such a project can't be achieved yet. This could be
interesting to spend three months collecting resources, parsing them,
writing a synthesis and hopefully opening new doors to research. However
I don't believe this is the goal of the summer of code. From what I
understood, this is more about getting involved in an open source
project and achieving an end-product or improving an existing one.

I will keep tracking this research subject though and learn more about
distcc, icecream and similar tools on my spare time.

Regards,

Antoine Pinsard

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-soc] Project Proposal : GenCC for "Gentoo Community Compiling"
  2013-02-19  1:20 ` Luca Barbato
  2013-02-19 21:10   ` Antoine Pinsard
@ 2013-04-01 14:43   ` Donnie Berkholz
  2013-04-01 15:30     ` Rich Freeman
  1 sibling, 1 reply; 11+ messages in thread
From: Donnie Berkholz @ 2013-04-01 14:43 UTC (permalink / raw
  To: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 1877 bytes --]

On 02:20 Tue 19 Feb     , Luca Barbato wrote:
> On 18/02/13 14:29, Antoine Pinsard wrote:
> > This is a very basic approach of the tool but I think it gives the main
> > idea of the projet. I would like to have your opinion on whether it
> > could be a gsoc project or not. And if it could, what backgrounds it
> > would require. I think this is much more about networking and security
> > than compiling (though it would require at least a basic knowledge of
> > distcc).
> 
> You should study what had been done in the past (distcc, icecream etc)
> and figure out what they are lacking and why nobody is using them on a
> geographic network.
> 
> Then you have the problem of building a ring of trust strong enough.
> 
> And eventually you have to come to term with how compilers behave
> differently depending on a number of situations.
> 
> Looks a quite good research project but I warn you not to expect quick
> or easy results.

I had a very similar idea around 10 years ago and spent a while looking 
into the feasibility of it, along with a former Gentoo developer. We 
called the idea p2pcc and created a Sourceforge project 
<http://sourceforge.net/projects/p2pcc/> but the research showed that it 
wasn't worthwhile at the time to write any code.

The problem in the end is that it wasn't really feasible on anything 
besides a 100 MBit *minimum* connection with low latency, otherwise you 
spent more time transferring files around than compiling them.

Today you could imagine it being potentially interesting if you think 
about it in the context of public (or private) cloud living in shared 
datacenters.

-- 
Thanks,
Donnie

Donnie Berkholz
Summer of Code Admin, Gentoo Linux <http://dberkholz.com>
Council Member / Sr. Developer, Gentoo Linux <http://dberkholz.com>
Analyst, RedMonk <http://redmonk.com/dberkholz/>

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-soc] Project Proposal : GenCC for "Gentoo Community Compiling"
  2013-04-01 14:43   ` Donnie Berkholz
@ 2013-04-01 15:30     ` Rich Freeman
  0 siblings, 0 replies; 11+ messages in thread
From: Rich Freeman @ 2013-04-01 15:30 UTC (permalink / raw
  To: gentoo-soc

On Mon, Apr 1, 2013 at 10:43 AM, Donnie Berkholz <dberkholz@gentoo.org> wrote:
> The problem in the end is that it wasn't really feasible on anything
> besides a 100 MBit *minimum* connection with low latency, otherwise you
> spent more time transferring files around than compiling them.
>
> Today you could imagine it being potentially interesting if you think
> about it in the context of public (or private) cloud living in shared
> datacenters.

I messed around with distcc in combination with EC2 thinking I might
make my life easier when my previous box was taking eons to compile
things like chromium.  The goal was to have my box at home just send
out work and do no little local compilation.  It was as much an
experiment as anything.

I found that the latency of the connection REALLY slowed things down
unless you had a very high level of parallelism (think 32-64 files in
transit at any time).  If I got the number of jobs high enough it
would perform better, and I'd have many more jobs than hosts because
the reality is that 75% of the jobs were just travelling over the
network and not running at any time.

The problems were:
1.  Few build systems could actually get that many jobs running in parallel.
2.  distcc only manages a queue for gcc jobs run in parallel, but make
-j64 runs everything in parallel.
3.  distcc still requires dependencies to be present on all of the
build systems.

#2 meant that when some java program built I'd end up with 64 java VMs
just killing my system.  Even shell or python scripts would just kill
the thing, and many build systems are a combination of C and
scripts/etc.

It might do marginally better if you could improve the build systems,
but for the most part it is local-only.  It might make more sense in
more dedicated roles like a tinderbox that only builds a single
package as it could be optimized to overcome those issues.

Rich


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-04-01 15:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-18 13:29 [gentoo-soc] Project Proposal : GenCC for "Gentoo Community Compiling" Antoine Pinsard
2013-02-18 14:03 ` Patrick Lauer
2013-02-18 14:41   ` Antoine Pinsard
2013-02-18 19:13   ` Rich Freeman
2013-02-18 22:52     ` Antoine Pinsard
2013-02-18 23:27       ` Rich Freeman
2013-02-19  0:55         ` Antoine Pinsard
2013-02-19  1:20 ` Luca Barbato
2013-02-19 21:10   ` Antoine Pinsard
2013-04-01 14:43   ` Donnie Berkholz
2013-04-01 15:30     ` Rich Freeman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox