[gentoo-portage-dev] Multiple language use, `component-based' design, and other issues

public inbox for gentoo-portage-dev@lists.gentoo.org
 help / color / mirror / Atom feed

* [gentoo-portage-dev] Multiple language use, `component-based' design, and other issues
@ 2004-01-12  5:14 Jeremy Maitin-Shepard
  2004-01-12  9:15 ` Pieter Van den Abeele
  0 siblings, 1 reply; 3+ messages in thread
From: Jeremy Maitin-Shepard @ 2004-01-12  5:14 UTC (permalink / raw
  To: gentoo-portage-dev

[-- Attachment #1: Type: text/plain, Size: 5207 bytes --]

It has been mentioned that portage-ng will be written using multiple
programming languages.  I see a number of compelling reasons to avoid
using multiple languages, while I see no particular compelling advantage.

First, supporting multiple languages will mean that significant time and
energy will need to be spent developing and maintaining the
inter-language interfaces.  The result will be a system that is less
efficient (inter-language interfaces tend to be less efficient then
intra-language ones) and more difficult to maintain and extend (because
extending will mean that these inter-language interfaces will need to be
extended correspondingly).  Furthermore, inter-language interfaces tend
to be less elegant syntactically than intra-language ones, meaning that
the resulting code will generally be less elegant (and thus more
difficult to follow).

The Python-Bash inter-operation in the current portage is a clear
example of the inefficiency involved in language interoperability
(certainly, most will not be as inefficient as Bash).  This problem can
be avoided in portage-ng in a number of ways, which include stricter
specification of how meta-data variables can be specified in the ebuild
files, separating ebuilds into multiple files, and developing a new
bash-like language specific to portage and writing a parser and
interpreter for this language which is included in portage.  (In order
for the development of portage-ng to begin, it would be useful to
decide on issues like this.)

A second disadvantage to multiple language use is that it will increase
the compile-time dependencies (and depending on the language, possibly
the runtime dependencies) of portage unnecessarily.  If portage is
written using, for example, Python, Prolog, and Ruby, then _EVERY_
Gentoo system that compiles portage will need Python, Prolog, and Ruby
installed, and possibly have them to run portage also.  Python is less
of a problem because it is relatively common, but Prolog and Ruby are
often not found on systems.  Additional dependencies further complicate
the handling in portage of those dependency packages (as is the case
currently with python), especially if they are a runtime dependency.
Additional dependencies also bloat the size of the stage tarballs.

A third issue is that while we all have our pet language, and in
principle it might seem useful to support `all' languages, so that we
can each use our language of choice in writing portage, in practice
support for each language will likely require significant effort to
write the necessary interface code.  Furthermore, and perhaps a greater
problem, in practice it will be necessary for portage developers to
learn all of the languages used in the various parts of portage.  Thus,
instead of some people having to learn and use a single language that
they might prefer to avoid, all developers would need to learn and use
multiple languages which they would prefer to avoid.

The main advantage cited for using multiple languages has been that
certain languages are `better' for doing certain types of things.  For
example, it has been argued that Prolog should be used for dependency
calculation because programs written in it can be proven correct.  I
fail to understand why it is particularly important that the portage
dependency checker be ``provably correct.''  To my knowledge, there have
been no significant problems with there being bugs in the current
dependency checker.  It is also useful to note that there are tens of
thousands or more software programs, far more critical than a portage
dependency checker, which are not written in languages in which they can
be proven correct, but which operate quite adequately.  It is perhaps
useful for the software on the Spirit Mars rover to be provably correct,
but that is simply not a useful guideline for the portage dependency
checker.  It has also been argued that it is more convenient to write a
dependency checker in Prolog, compared to other languages.  I can
assure the reader that implementing a topological sort algorithm in any
language is not overly complex; certainly not sufficiently complex to
justify doing it in another language and dealing with language
interoperability problems and adding an additional compile-time and
possibly runtime dependency.

It has been mentioned that portage-ng will have a `component-based'
design.  It is not clear what the term `component-based' means exactly,
and so it would be useful to get some clarification.  To me that term
suggests that portage-ng will use some sort of complex runtime dynamic
linking model.  I would argue that it may be simpler to simply handle
all optional functionality, if there is any, through the use of USE
flags, rather than going to the trouble of supporting such a complex
model.  Clearly, some type of static linking could only be supported if
portage-ng did not depend on a runtime dynamic linking model.
Advantages to allow static linking include greater robustness in cases
of failure of various shared libraries, and the dependencies of portage
will not need to be handled as carefully, because there will be no risk
of breaking a statically-linked portage.

-- 
Jeremy Maitin-Shepard

[-- Attachment #2: Type: application/pgp-signature, Size: 188 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [gentoo-portage-dev] Multiple language use, `component-based' design, and other issues
  2004-01-12  5:14 [gentoo-portage-dev] Multiple language use, `component-based' design, and other issues Jeremy Maitin-Shepard
@ 2004-01-12  9:15 ` Pieter Van den Abeele
  2004-01-12 15:15   ` Jeremy Maitin-Shepard
  0 siblings, 1 reply; 3+ messages in thread
From: Pieter Van den Abeele @ 2004-01-12  9:15 UTC (permalink / raw
  To: gentoo-portage-dev

Hi,

On 12 Jan 2004, at 06:14, Jeremy Maitin-Shepard wrote:

> The main advantage cited for using multiple languages has been that
> certain languages are `better' for doing certain types of things.  For
> example, it has been argued that Prolog should be used for dependency
> calculation because programs written in it can be proven correct.  I
> fail to understand why it is particularly important that the portage
> dependency checker be ``provably correct.''  To my knowledge, there 
> have
> been no significant problems with there being bugs in the current
> dependency checker.  It is also useful to note that there are tens of
> thousands or more software programs, far more critical than a portage
> dependency checker, which are not written in languages in which they 
> can
> be proven correct, but which operate quite adequately.  It is perhaps
> useful for the software on the Spirit Mars rover to be provably 
> correct,
> but that is simply not a useful guideline for the portage dependency
> checker.  It has also been argued that it is more convenient to write a
> dependency checker in Prolog, compared to other languages.  I can
> assure the reader that implementing a topological sort algorithm in any
> language is not overly complex; certainly not sufficiently complex to
> justify doing it in another language and dealing with language
> interoperability problems and adding an additional compile-time and
> possibly runtime dependency.

Ah, this is a common misconception. It is true that a topological sort 
can be realized in every language. There is certainly no clear 
advantage of using prolog for that. However, what happens -before- 
applying that ordering and which is often forgotten, is 'computing a 
configuration' or 'computing a model' if you prefer. Let me explain 
with a simplified example.

Right now current portage performs a walk through the dependency graph. 
Please consider the following simplified example:

kde -> linux.
kde -> bsd.

means that kde depends on either linux or bsd. So if a user told 
portage he/she wanted kde, portage could suggest either linux or bsd. 
However if a user also told portage he/she wanted component 'foo' and 
it is known that

'foo cannot be installed on bsd'

the bsd option for kde is no longer a relevant option, because it is 
excluded by a constraint introduced by adding foo to the configuration. 
Portage currently does not feature such constraints (which are quite 
similar to the 'conflict field used in debian packages'. I think it 
will be hard to implement similar behaviour. Also, this is a simplified 
example. An example from real life shows that:

kde -> qt(use kde)

So if you want to install kde, qt should be installed with option 'kde' 
enabled. But similarily, a component should be able to constrain use 
flags of another package. Or a combination of components could be 
constrained, ...

Reasoning about these things is much much easier in something like 
prolog. But the real advantage here is that prolog can be regarded as a 
proof engine and thus is able to provide a proof why a component with a 
given specification is in a configuration. It is important to ensure 
that the configuration returned is correct and satisfies user 
constraints/requirements. So in that sense, it will be 'provable'. But 
again, I don't see prolog as an absolute requirement; everything which 
provides the equivalent of a Turing Machine should can be used to 
implement what I'm doing now in Prolog. The set of features that can be 
reused in prolog happens to be bigger than for instance C++ or anything 
else for this problem, which speeds up development of an explicatory 
prototype.

> It has been mentioned that portage-ng will have a `component-based'
> design.  It is not clear what the term `component-based' means exactly,
> and so it would be useful to get some clarification.  To me that term
> suggests that portage-ng will use some sort of complex runtime dynamic
> linking model.  I would argue that it may be simpler to simply handle
> all optional functionality, if there is any, through the use of USE
> flags, rather than going to the trouble of supporting such a complex
> model.  Clearly, some type of static linking could only be supported if
> portage-ng did not depend on a runtime dynamic linking model.
> Advantages to allow static linking include greater robustness in cases
> of failure of various shared libraries, and the dependencies of portage
> will not need to be handled as carefully, because there will be no risk
> of breaking a statically-linked portage.

You make some valid points in this last section. My interpretation of 
component based design is that we'll allow users to create their own 
component (which uses a specific interface) and link those to the 
program. Whether that linking happens at runtime or compile time is not 
specified yet, but I wouldn't exclude one.

Examples of such components that were given in the past include a 
'logging strategy', a 'compression strategy', a 'parallel compilation 
strategy', an 'installation strategy', ... The bottleneck areas in 
current portage. What we don't want portage-ng to be is one huge file 
with over 10 thousand lines of code. 'Components' is a generic term 
we've introduced and which includes 'patterns', 'objects', 'modules', 
'tiers', 'layers', all concepts which facilitate parallel development, 
community development without focusing explicitly on one paradigm.

Pieter

--
gentoo-portage-dev@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [gentoo-portage-dev] Multiple language use, `component-based' design, and other issues
  2004-01-12  9:15 ` Pieter Van den Abeele
@ 2004-01-12 15:15   ` Jeremy Maitin-Shepard
  0 siblings, 0 replies; 3+ messages in thread
From: Jeremy Maitin-Shepard @ 2004-01-12 15:15 UTC (permalink / raw
  To: gentoo-portage-dev

Pieter Van den Abeele <pvdabeel@gentoo.org> writes:

> [snip: constraints]

> So if you want to install kde, qt should be installed with option 'kde'
> enabled. But similarily, a component should be able to constrain use flags of
> another package. Or a combination of components could be constrained,
> ...

Ah okay.  I did not consider the addition of such constraints.  If
optional constraints are supported (i.e. pkg1 -> qt(use whatever) OR
something-else(not USE=xaw3d)), the problem becomes quite complex; it
seems it will be non-deterministic polynomial time with respect to the
number of packages in the relevant subset of the dependency graph.

> Reasoning about these things is much much easier in something like prolog. But
> the real advantage here is that prolog can be regarded as a proof engine and
> thus is able to provide a proof why a component with a given specification is in
> a configuration. It is important to ensure that the configuration returned is
> correct and satisfies user constraints/requirements. So in that sense, it will
> be 'provable'. But again, I don't see prolog as an absolute requirement;
> everything which provides the equivalent of a Turing Machine should can be used
> to implement what I'm doing now in Prolog. The set of features that can be
> reused in prolog happens to be bigger than for instance C++ or anything else for
> this problem, which speeds up development of an explicatory prototype.

I see your point here.

-- 
Jeremy Maitin-Shepard

--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-01-12 15:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-12  5:14 [gentoo-portage-dev] Multiple language use, `component-based' design, and other issues Jeremy Maitin-Shepard
2004-01-12  9:15 ` Pieter Van den Abeele
2004-01-12 15:15   ` Jeremy Maitin-Shepard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox