public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] LTO use in the tree
@ 2014-04-21  3:14 Ryan Hill
  2014-04-21  4:02 ` [gentoo-dev] " Ryan Hill
  2014-04-21  6:53 ` [gentoo-dev] " Michał Górny
  0 siblings, 2 replies; 32+ messages in thread
From: Ryan Hill @ 2014-04-21  3:14 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1300 bytes --]

Hey all,

As more and more packages are starting to add LTO flags automatically through
their build systems, I thought I'd point out a couple things:

- LTO utterly destroys debug info.  Flags like -g are incompatible with LTO.

- LTO causes .GCC.command.line sections to be discarded, which means your
  package will always be QA flagged as ignoring CFLAGS.

- LTO takes a _lot_ of memory.  That memory is required on the host arch.
  Distcc doesn't help things here, because linking happens locally.  Consider
  all the archs your package is built on, and if they all routinely have
  multiple GBs of memory installed.

- LTO in 4.7 is still fairly buggy.  There are no plans to fix it.  4.8 is
  better, but 4.9 moves to a different model, so bugs in 4.8 probably won't be
  fixed, especially regarding memory usage.

- I'm happy to backport patches to fix LTO problems if they're available, but
  you'll generally have to do the legwork.  And like I said, most aren't going
  to be backportable.

Please take these things into consideration when deciding whether or not this
feature is worth it.

Thanks.


-- 
Ryan Hill                        psn: dirtyepic_sk
   gcc-porting/toolchain/wxwidgets @ gentoo.org

47C3 6D62 4864 0E49 8E9E  7F92 ED38 BD49 957A 8463

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [gentoo-dev] Re: LTO use in the tree
  2014-04-21  3:14 [gentoo-dev] LTO use in the tree Ryan Hill
@ 2014-04-21  4:02 ` Ryan Hill
  2014-04-22  8:45   ` Martin Vaeth
  2014-04-22 18:10   ` Matt Turner
  2014-04-21  6:53 ` [gentoo-dev] " Michał Górny
  1 sibling, 2 replies; 32+ messages in thread
From: Ryan Hill @ 2014-04-21  4:02 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1700 bytes --]

On Sun, 20 Apr 2014 21:14:51 -0600
Ryan Hill <rhill@gentoo.org> wrote:

> Hey all,
> 
> As more and more packages are starting to add LTO flags automatically through
> their build systems, I thought I'd point out a couple things:
> 
> - LTO utterly destroys debug info.  Flags like -g are incompatible with LTO.
> 
> - LTO causes .GCC.command.line sections to be discarded, which means your
>   package will always be QA flagged as ignoring CFLAGS.
> 
> - LTO takes a _lot_ of memory.  That memory is required on the host arch.
>   Distcc doesn't help things here, because linking happens locally.  Consider
>   all the archs your package is built on, and if they all routinely have
>   multiple GBs of memory installed.
> 
> - LTO in 4.7 is still fairly buggy.  There are no plans to fix it.  4.8 is
>   better, but 4.9 moves to a different model, so bugs in 4.8 probably won't be
>   fixed, especially regarding memory usage.
> 
> - I'm happy to backport patches to fix LTO problems if they're available, but
>   you'll generally have to do the legwork.  And like I said, most aren't going
>   to be backportable.
> 
> Please take these things into consideration when deciding whether or not this
> feature is worth it.
> 
> Thanks.
> 
> 

One thing I forgot to mention - LTO can also have detrimental effect on certain
architectures.  On some (eg. ppc), performance can actually be degraded due to
increased register pressure.  On others like alpha it's questionable if it'll
even work at all...


-- 
Ryan Hill                        psn: dirtyepic_sk
   gcc-porting/toolchain/wxwidgets @ gentoo.org

47C3 6D62 4864 0E49 8E9E  7F92 ED38 BD49 957A 8463

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] LTO use in the tree
  2014-04-21  3:14 [gentoo-dev] LTO use in the tree Ryan Hill
  2014-04-21  4:02 ` [gentoo-dev] " Ryan Hill
@ 2014-04-21  6:53 ` Michał Górny
  1 sibling, 0 replies; 32+ messages in thread
From: Michał Górny @ 2014-04-21  6:53 UTC (permalink / raw
  To: gentoo-dev; +Cc: rhill

[-- Attachment #1: Type: text/plain, Size: 973 bytes --]

Dnia 2014-04-20, o godz. 21:14:51
Ryan Hill <rhill@gentoo.org> napisał(a):

> As more and more packages are starting to add LTO flags automatically through
> their build systems, I thought I'd point out a couple things:
> 
> - LTO utterly destroys debug info.  Flags like -g are incompatible with LTO.
> 
> - LTO causes .GCC.command.line sections to be discarded, which means your
>   package will always be QA flagged as ignoring CFLAGS.
> 
> - LTO takes a _lot_ of memory.  That memory is required on the host arch.
>   Distcc doesn't help things here, because linking happens locally.  Consider
>   all the archs your package is built on, and if they all routinely have
>   multiple GBs of memory installed.

Those are the reasons why I have disabled the default -flto in systemd.
For other packages based on systemd configure macros, it can be
disabled via the following configure hack:

  cc_cv_CFLAGS__flto=no

-- 
Best regards,
Michał Górny

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 966 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [gentoo-dev] Re: LTO use in the tree
  2014-04-21  4:02 ` [gentoo-dev] " Ryan Hill
@ 2014-04-22  8:45   ` Martin Vaeth
  2014-04-26 10:23     ` Michał Górny
  2014-05-03  0:24     ` Ryan Hill
  2014-04-22 18:10   ` Matt Turner
  1 sibling, 2 replies; 32+ messages in thread
From: Martin Vaeth @ 2014-04-22  8:45 UTC (permalink / raw
  To: gentoo-dev

Ryan Hill <rhill@gentoo.org> wrote:
>
> One thing I forgot to mention - LTO can also have detrimental effect on
> certain architectures.  On some (eg. ppc), performance can actually
> be degraded due to increased register pressure.

If this really is the case it is not the problem of LTO but
of the optimizer: If the optimizer really produces *worse*
code when he *can* see the full program instead of only parts of it,
something is severely broken in the optimizer. Only decreasing the
possibilities of the optimizer by removing LTO would be the wrong way
to "solve" this problem.

Of course, this does not touch the validity of your other arguments.

On the other hand, if upstream tests and supports LTO, it should
be communicated to the user somehow that this is the case.
The same dilemma applies to some other CFLAGS which should not be
used in general but only if the code is written for them:
Is it really a good idea to produce in such cases *by default* code
which is less optimal than supported by upstream and the user is
not even informed about this change?



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-21  4:02 ` [gentoo-dev] " Ryan Hill
  2014-04-22  8:45   ` Martin Vaeth
@ 2014-04-22 18:10   ` Matt Turner
  2014-05-02 23:55     ` Ryan Hill
  1 sibling, 1 reply; 32+ messages in thread
From: Matt Turner @ 2014-04-22 18:10 UTC (permalink / raw
  To: gentoo-dev

> One thing I forgot to mention - LTO can also have detrimental effect on certain
> architectures.  On some (eg. ppc), performance can actually be degraded due to
> increased register pressure.  On others like alpha it's questionable if it'll
> even work at all...

Worked for me on alpha, at least for what I tried. It cut eix's binary
from 2 to 1.3 MB as well.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-22  8:45   ` Martin Vaeth
@ 2014-04-26 10:23     ` Michał Górny
  2014-04-26 11:15       ` Rich Freeman
  2014-04-26 14:35       ` Martin Vaeth
  2014-05-03  0:24     ` Ryan Hill
  1 sibling, 2 replies; 32+ messages in thread
From: Michał Górny @ 2014-04-26 10:23 UTC (permalink / raw
  To: gentoo-dev; +Cc: martin

[-- Attachment #1: Type: text/plain, Size: 1097 bytes --]

Dnia 2014-04-22, o godz. 08:45:31
Martin Vaeth <martin@mvath.de> napisał(a):

> On the other hand, if upstream tests and supports LTO, it should
> be communicated to the user somehow that this is the case.
> The same dilemma applies to some other CFLAGS which should not be
> used in general but only if the code is written for them.

Why do you believe that LTO 'should not be used in general'?

As far as I understand, the LTO concept is suited well for most
programs, though the results can vary. I agree that in the early stage
many packages may be unhappy about it but as far as I understand, once
it is more widespread only a few corner cases would be unsuited for LTO
(+ the usual limitations like memory).

That being the case, I'd feel it be more correct for LTO to disabled
by default and enabled via CFLAGS+LDFLAGS, with packages not supporting
LTO using flag-o-matic to filter them out.

Although I should note that my understanding of LTO is pretty much
limited to clang's angle. I don't know if gcc doesn't behave different.

-- 
Best regards,
Michał Górny

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 966 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-26 10:23     ` Michał Górny
@ 2014-04-26 11:15       ` Rich Freeman
  2014-04-26 15:00         ` Martin Vaeth
  2014-04-26 14:35       ` Martin Vaeth
  1 sibling, 1 reply; 32+ messages in thread
From: Rich Freeman @ 2014-04-26 11:15 UTC (permalink / raw
  To: gentoo-dev; +Cc: martin

On Sat, Apr 26, 2014 at 6:23 AM, Michał Górny <mgorny@gentoo.org> wrote:
>
> As far as I understand, the LTO concept is suited well for most
> programs, though the results can vary. I agree that in the early stage
> many packages may be unhappy about it but as far as I understand, once
> it is more widespread only a few corner cases would be unsuited for LTO
> (+ the usual limitations like memory).

I tend to agree.  I've been using stable gcc with -flto in my CFLAGS
for a while now with only isolated problems.  When I run into a
problem, I disable it for that package alone.  So far I've only done
it for 26 packages.  I wouldn't be surprised if some of them now work.

I wouldn't really put LTO in the same category as fast-math.

Anybody who is using it should be prepared to run into the odd
breakage.  It does make sense to filter the flag when it is known to
not work.

Rich


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [gentoo-dev] Re: LTO use in the tree
  2014-04-26 10:23     ` Michał Górny
  2014-04-26 11:15       ` Rich Freeman
@ 2014-04-26 14:35       ` Martin Vaeth
  1 sibling, 0 replies; 32+ messages in thread
From: Martin Vaeth @ 2014-04-26 14:35 UTC (permalink / raw
  To: gentoo-dev

Michał Górny <mgorny@gentoo.org> wrote:
>
> Dnia 2014-04-22, o godz. 08:45:31
> Martin Vaeth <martin@mvath.de> napisa=B3(a):
>
>> On the other hand, if upstream tests and supports LTO, it should
>> be communicated to the user somehow that this is the case.
>> The same dilemma applies to some other CFLAGS which should not be
>> used in general but only if the code is written for them.
>
> Why do you believe that LTO 'should not be used in general'?

In the last sentence, I did not have LTO in mind but things
like -fmerge-all-constants -fnothrow-opt -fno-enforce-eh-specs
or -fno-common. The latter is perhaps again a bit similar to LTO:

> As far as I understand, the LTO concept is suited well for most
> programs

For programs yes, but if libraries are involved often not:
Experience shows that most packages which provide or use
(internally) libraries break with LTO if upstream does
not care about.
In particular, unless gold linker is expected (which unfortunately
cannot be so easily changed locally for a package and which
causes other problems if used system wide), it can make sense
to compile only certain parts of the package with LTO
(e.g. the main binary).
That's why it can make sense to let the package care about
the -flto CFLAG.

> (+ the usual limitations like memory).

not to forget compilation time which increases by
the remarkable factor 2, at least (unless optimized for lto
with e.g. -fno-fat-lto-objects).



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [gentoo-dev] Re: LTO use in the tree
  2014-04-26 11:15       ` Rich Freeman
@ 2014-04-26 15:00         ` Martin Vaeth
  2014-04-26 16:34           ` Rich Freeman
  0 siblings, 1 reply; 32+ messages in thread
From: Martin Vaeth @ 2014-04-26 15:00 UTC (permalink / raw
  To: gentoo-dev

Rich Freeman <rich0@gentoo.org> wrote:
>
> I tend to agree.  I've been using stable gcc with -flto in my CFLAGS
> for a while now with only isolated problems.

I wouldn't call these problems isolated:
My current exception file has 340 lines, some of them containing
wildcards, and it has a tendency to grow.
For comparison: I have ~1400 packages installed.
Maybe our milage varies because I have many multimedia packets
which use lots of libraries - these usually break.

> I wouldn't be surprised if some of them now work.

Not much change between gcc versions up to 4.8:
Some packages worked with newer versions some others broke instead.
I have no experience yet with 4.9.

> Anybody who is using it should be prepared to run into the odd
> breakage.

That's why it is wise that gentoo does not recommend to use LTO
on a global scale.
However, for packages which are tested by upstream with LTO...?

> It does make sense to filter the flag when it is known to
> not work.

This would be the best solution of course: Recommend LTO and
filter every occassion which breaks. But currently this is
not realistic, because too many ebuilds would need to be tested
and checked. Moreover, sometimes it depends on the gcc version
whether filtering is necessary (although, as mentioned, these
cases are relatively rare with <gcc-4.9).
So, one should not expect this in any near future.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-26 15:00         ` Martin Vaeth
@ 2014-04-26 16:34           ` Rich Freeman
  2014-04-26 19:58             ` Martin Vaeth
  0 siblings, 1 reply; 32+ messages in thread
From: Rich Freeman @ 2014-04-26 16:34 UTC (permalink / raw
  To: gentoo-dev

On Sat, Apr 26, 2014 at 11:00 AM, Martin Vaeth <martin@mvath.de> wrote:
> Rich Freeman <rich0@gentoo.org> wrote:
>> It does make sense to filter the flag when it is known to
>> not work.
>
> This would be the best solution of course: Recommend LTO and
> filter every occassion which breaks. But currently this is
> not realistic, because too many ebuilds would need to be tested
> and checked. Moreover, sometimes it depends on the gcc version
> whether filtering is necessary (although, as mentioned, these
> cases are relatively rare with <gcc-4.9).
> So, one should not expect this in any near future.

Sounds like many are already using it, and thus much of this testing
has already been done.  The results simply need to be collected.

FWIW the list of packages I have issues with include:
app-antivirus/clamav
app-backup/bacula
app-office/libreoffice
dev-libs/elfutils
dev-libs/libaio
dev-libs/libpcre
dev-python/pygtkglext
dev-python/wxpython
dev-qt/qt-creator
dev-tex/luatex
dev-util/kdevelop
dev-util/kdevplatform
kde-base/gwenview
kde-base/kdelibs
kde-base/korganizer
media-gfx/inkscape
media-sound/amarok
media-video/mplayer
net-libs/webkit-gtk
net-misc/nx
net-wireless/gnuradio
sys-apps/kmod
sys-devel/llvm
sys-power/upower
x11-libs/gtkglext
x11-libs/wxGTK

There are also other packages which I build with very simplified
CFLAGS (just -O2 basically) which may or may not have LTO issues.
Also, some packages above like libreoffice may very well build just
fine if you have sufficient RAM.

I do run a modest number of multimedia packages, and only the ones
above have caused issue (though if any haven't been updated in a few
years by some stroke of luck and being abandoned then they wouldn't be
tested with LTO).

Rich


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [gentoo-dev] Re: LTO use in the tree
  2014-04-26 16:34           ` Rich Freeman
@ 2014-04-26 19:58             ` Martin Vaeth
  2014-04-27  0:34               ` "C. Bergström"
  2014-04-27  0:49               ` Rich Freeman
  0 siblings, 2 replies; 32+ messages in thread
From: Martin Vaeth @ 2014-04-26 19:58 UTC (permalink / raw
  To: gentoo-dev

Rich Freeman <rich0@gentoo.org> wrote:
>
> FWIW the list of packages I have issues with include:

Not sure whether this is the right place to post it.

Anyway, here is my filter file.
I remember now that almost everything using cmake and
related tools failed (e.g. practically all of kde-base, dev-qt).
Meanwhile I have uninstalled kde and thus not tested for
many months...

"legend":
+flto* is a shortcut for filtering all of
  -flto -fuse-linker-plugin -femit-llvm -fwhole-program
+fwhole-program only filters -fwhole-program

I have not always tested whether filtering -fwhole-program
alone would be sufficient, but in many cases I did, and
usually it was not sufficient.

Note that language plugins (e.g. perl-core/* dev-python/*)
usually failed if they contain C code, so for simplicity I
filtered the whole category (although in some cases perhaps
filtering -fwhole-program alone might be sufficient).

*-libs/* +fwhole-program
*-plugins/* +fwhole-program
app-admin/sudo +flto*
app-arch/bzip2 +flto*
app-arch/cpio +flto*
app-arch/p7zip +flto*
app-arch/par2cmdline +flto*
app-arch/sharutils +fwhole-program
app-arch/star +flto*
app-arch/tar +fwhole-program
app-arch/unrar +flto*
app-arch/unzip +fwhole-program
app-arch/zpaq +fwhole-program
app-backup/dar +flto*
app-cdr/cdrtools +flto*
app-cdr/k3b +flto*
app-crypt/gnupg +fwhole-program
app-crypt/pinentry +fwhole-program
app-crypt/qca +flto*
app-editors/jed +flto*
app-editors/kile +flto*
app-editors/vim +flto*
app-emulation/bochs +flto*
app-emulation/dosemu +flto*
app-emulation/vice +flto*
app-emulation/wine +flto*
app-misc/lirc +fwhole-program
app-misc/mc +fwhole-program
app-misc/strigi +flto*
app-misc/tmux +flto*
app-office/calligra +flto*
app-office/libreoffice +flto*
app-shells/bash +flto*
app-shells/zsh +flto*
app-text/a2ps +fwhole-program
app-text/aspell +flto*
app-text/convertlit +fwhole-program
app-text/djvu +flto*
app-text/dvipsk +flto*
app-text/ebook-tools +fwhole-program
app-text/fbreader +flto*
app-text/ghostscript-gpl +flto*
app-text/gocr +fwhole-program
app-text/hunspell +flto*
app-text/mupdf +flto*
app-text/poppler +flto*
app-text/ps2pkm +flto*
app-text/psutils +flto*
app-text/rarian +flto*
app-text/recode +fwhole-program
app-text/sablotron +flto*
app-text/stardict +flto*
app-text/teckit +flto*
app-text/texlive-core +flto*
app-text/unpaper +flto*
app-text/wdiff +fwhole-program
app-text/qpdfview +fwhole-program
app-text/xdvik +flto*
app-text/zathura* +fwhole-program
dev-cpp/atkmm +flto*
dev-cpp/cairomm +flto*
dev-cpp/clucene +flto*
dev-cpp/glibmm +flto*
dev-cpp/gtkmm +flto*
dev-cpp/libxmlpp +flto*
dev-cpp/pangomm +flto*
dev-db/sqlite +flto*
dev-games/flatzebra +flto*
dev-java/icedtea* +flto*
dev-lang/lua +flto*
dev-lang/orc +flto*
dev-lang/perl +flto*
dev-lang/python +flto*
dev-lang/ruby +flto*
dev-lang/spidermonkey +flto*
dev-lang/tcl +flto*
dev-lang/tk +flto*
dev-libs/boost +flto*
dev-libs/dbus-glib +flto*
dev-libs/elfutils +flto*
dev-libs/glib +flto*
dev-libs/gmp +flto*
dev-libs/libcdio +flto*
dev-libs/libpcre +flto*
dev-libs/libsigc++ +flto*
dev-libs/libusb +flto*
dev-libs/nspr +flto*
dev-libs/openssl +flto*
dev-libs/ppl +flto*
dev-libs/rlog +flto*
dev-libs/skalibs +flto*
dev-libs/xerces-c +flto*
dev-libs/zziplib +flto*
dev-lisp/clisp +flto*
dev-perl/* +flto*
dev-python/* +flto*
dev-qt/qt* +flto*
dev-scheme/guile +flto*
dev-tcltk/expect +flto*
dev-tex/luatex +flto*
dev-util/android-tools +flto*
dev-util/bdelta +flto*
dev-util/cmake +flto*
dev-util/dialog +flto*
dev-util/ltrace +flto*
dev-util/schroot +flto*
dev-util/valgrind +fwhole-program
dev-vcs/cvs +fwhole-program
dev-vcs/git +flto*
dev-vcs/mercurial +flto*
dev-vcs/monotone +flto*
dev-vcs/subversion +flto*
games-action/gltron +flto*
games-arcade/kobodeluxe +fwhole-program
games-arcade/lbreakout +flto*
games-arcade/rocksndiamonds +fwhole-program
games-arcade/xgalaga +fwhole-program
games-board/xboard +flto*
games-emulation/advancemame +fwhole-program
games-emulation/dosbox +flto*
games-emulation/sdlmame +flto*
games-emulation/xmame +flto*
games-emulation/xmess +flto*
games-engines/scummvm +flto*
games-engines/scummvm-tools +flto*
games-fps/doomsday +flto*
games-fps/prboom +flto*
games-puzzle/enigma +flto*
games-rpg/freedroidrpg +flto*
gnome-base/libglade +flto*
kde-base/* +flto*
mail-client/claws-mail +flto*
mail-filter/maildrop +fwhole-program
media-gfx/exiv2 +flto*
media-gfx/gimp +flto*
media-gfx/graphicsmagick +flto*
media-gfx/graphite2 +flto*
media-gfx/graphviz +flto*
media-gfx/imagemagick +flto*
media-gfx/pstoedit +flto*
media-gfx/sam2p +flto*
media-gfx/sane-backends +flto*
media-gfx/transfig +flto*
media-gfx/xv +flto*
media-libs/alsa-lib +flto*
media-libs/avidemux* +flto*
media-libs/flac +flto*
media-libs/freetype +flto*
media-libs/giflib +flto*
media-libs/gstreamer +flto*
media-libs/jbigkit +flto*
media-libs/jpeg +flto*
media-libs/libcaca +flto*
media-libs/libdvbpsi +flto*
media-libs/libdvdnav +flto*
media-libs/libdvdread +flto*
media-libs/liblastfm +flto*
media-libs/libmimic +flto*
media-libs/libmodplug +flto*
media-libs/libmp4v2 +flto*
media-libs/libpng +flto*
media-libs/libpostproc +flto*
media-libs/libsidplay +flto*
media-libs/libsndfile +flto*
media-libs/libv4l +flto*
media-libs/libvpx +flto*
media-libs/mediastreamer +flto*
media-libs/mesa +flto*
media-libs/musicbrainz +flto*
media-libs/netpbm +flto*
media-libs/opencore-amr +flto*
media-libs/openjpeg +flto*
media-libs/phonon +flto*
media-libs/plotutils +flto*
media-libs/raptor +flto*
media-libs/schroedinger +flto*
media-libs/silgraphite +flto*
media-libs/smpeg +flto*
media-libs/t1lib +flto*
media-libs/tiff +flto*
media-libs/x264 +flto*
media-libs/zvbi +flto*
media-plugins/live +flto*
media-sound/audacity +flto*
media-sound/audex +flto*
media-sound/cdparanoia +flto*
media-sound/gsm +flto*
media-sound/kradio +flto*
media-sound/kstreamripper +flto*
media-sound/lilypond +flto*
media-sound/musepack-tools +fwhole-program
media-sound/normalize +flto*
media-sound/qmmp +fwhole-program
media-sound/timidity++ +flto*
media-sound/vorbis-tools +flto*
media-sound/wavpack +flto*
media-sound/xmms2 +flto*
media-tv/kdetv +flto*
media-tv/linuxtv-dvb-apps +flto*
media-tv/v4l-utils +flto*
media-tv/xawtv +flto*
media-video/avidemux +flto*
media-video/cclive +flto*
media-video/dirac +flto*
media-video/ffmpeg +flto*
media-video/ffmpegthumbnailer +flto*
media-video/gnome-mplayer +fwhole-program
media-video/kaffeine +flto*
media-video/libav +flto*
media-video/mjpegtools +flto*
media-video/mplayer* +flto*
media-video/nvidia-settings +flto*
media-video/rtmpdump +fwhole-program
media-video/transcode +flto*
media-video/vlc +flto*
media-video/xine-ui +flto*
net-analyzer/wireshark +flto*
net-dialup/ppp +flto*
net-dns/libidn +fwhole-program
net-dns/pdnsd +flto*
net-firewall/ipsec-tools +fwhole-program
net-firewall/iptables +flto*
net-fs/autofs +flto*
net-ftp/lftp +flto*
net-libs/gnutls +flto*
net-libs/libetpan +flto*
net-libs/libpcap +flto*
net-libs/libsrtp +flto*
net-libs/opal +flto*
net-libs/ptlib +flto*
net-libs/wvstreams +flto*
net-mail/uw-mailutils +fwhole-program
net-misc/curl +flto*
net-misc/iputils +flto*
net-misc/nx +flto*
net-misc/nxcl +flto*
net-misc/openssh +flto*
net-misc/tor +flto*
net-p2p/ktorrent +flto*
net-print/cups +flto*
net-print/foo2zjs +flto*
net-voip/ekiga +flto*
net-voip/yate +flto*
net-wireless/wireless-tools +flto*
perl-core/* +flto*
sci-libs/cln +flto*
sci-libs/gdal +flto*
sci-libs/libgeotiff +flto*
sci-libs/qrupdate +flto*
sci-mathematics/axiom +flto*
sci-mathematics/ginac +flto*
sci-mathematics/glpk +flto*
sci-mathematics/octave +flto*
sci-mathematics/pari +flto*
sci-mathematics/scilab +flto*
sci-visualization/gnuplot +flto*
sys-apps/busybox +flto*
sys-apps/coreutils +flto*
sys-apps/dbus +flto*
sys-apps/fakeroot-ng +fwhole-program
sys-apps/findutils +flto*
sys-apps/gawk +flto*
sys-apps/grep +fwhole-program
sys-apps/groff +fwhole-program
sys-apps/hdparm +flto*
sys-apps/iproute2 +fwhole-program
sys-apps/kmod +flto*
sys-apps/less* +flto*
sys-apps/openrc +flto*
sys-apps/pciutils +flto*
sys-apps/sandbox +flto*
sys-apps/shadow +flto*
sys-apps/sysvinit +flto*
sys-apps/tcp-wrappers +fwhole-program
sys-apps/util-linux +flto*
sys-apps/which +flto*
sys-auth/polkit +flto*
sys-auth/polkit-qt +flto*
sys-devel/bc +fwhole-program
sys-devel/clang +flto* # -flto needs >3GB here
sys-devel/flex +flto*
sys-devel/gettext +flto*
sys-devel/libtool +flto*
sys-devel/llvm +flto*
sys-fs/ddrescue +flto*
sys-fs/dosfstools +flto*
sys-fs/e2fsprogs +flto*
sys-fs/encfs +flto*
sys-fs/ext4magic +flto*
sys-fs/lvm2 +flto*
sys-fs/mtools +flto*
sys-fs/squashfs-tools +flto*
sys-fs/udev +flto*
sys-fs/udftools +flto*
sys-fs/udisks +flto*
sys-fs/unionfs-fuse +flto*
sys-kernel/kccmp +flto*
sys-libs/e2fsprogs-libs +flto*
sys-libs/glibc +flto*
sys-libs/gpm +flto*
sys-libs/libcap +flto*
sys-libs/ncurses +flto*
sys-libs/slang +flto*
sys-libs/zlib +flto*
sys-power/iasl +flto*
sys-power/hibernate-script +fwhole-program
sys-power/suspend +flto*
sys-power/upower +flto*
sys-process/lsof +fwhole-program
sys-process/numactl +flto*
sys-process/procps +flto*
www-client/dillo +flto*
www-client/firefox +flto* # -flto needs too much memory/time
www-client/lynx +fwhole-program
www-plugins/gnash +flto*
www-plugins/mozplugger +flto*
x11-apps/xrandr +flto*
x11-base/xorg-server +flto*
x11-drivers/xf86-video-intel +flto*
x11-libs/fltk +flto*
x11-libs/gdk-pixbuf +flto*
x11-libs/gtkglext +flto*
x11-libs/libX11 +flto*
x11-libs/libXaw3d +flto*
x11-libs/libvdpau +flto*
x11-libs/libwnck +flto*
x11-libs/motif +flto*
x11-libs/pango +flto*
x11-libs/wxGTK +flto*
x11-misc/slim +flto*
x11-misc/xfractint +flto*
x11-misc/xscreensaver +flto*
x11-wm/fvwm +flto*
xfce-base/garcon +flto*
xfce-base/libxfce4ui +flto*
xfce-base/libxfce4util +flto*
xfce-base/libxfcegui4 +flto*
xfce-base/thunar +flto*
xfce-base/xfconf +flto*



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-26 19:58             ` Martin Vaeth
@ 2014-04-27  0:34               ` "C. Bergström"
  2014-04-27  2:14                 ` Alex Xu
                                   ` (2 more replies)
  2014-04-27  0:49               ` Rich Freeman
  1 sibling, 3 replies; 32+ messages in thread
From: "C. Bergström" @ 2014-04-27  0:34 UTC (permalink / raw
  To: gentoo-dev; +Cc: Martin Vaeth

On 04/27/14 02:58 AM, Martin Vaeth wrote:
> Rich Freeman <rich0@gentoo.org> wrote:
>> FWIW the list of packages I have issues with include:
> Not sure whether this is the right place to post it.
It's interesting to see that rather lengthy list. From a compiler 
engineer perspective I'd like to toss in my opinion
---------------------
Compiler flags are typically meant to do one or two things. Improve 
performance or reduce binary (code size). Pragmatically nobody gives a 
f* if grep has been optimized to the max since it's usually not the 
bottleneck. Having LTO and whole program optimizations turned on for 
every package will probably not give you a noticeably faster system, but 
will certainly slow your build down. (Due to rather large link times)

The packages which it really *should* be turned on for - anything which 
is computationally complex, Fortran and stuff where performance matters. 
I don't know gcc's LTO or what it's capable of, but in our compiler it 
would also potentially improve large c++ applications a lot. (It should 
help inline more aggressively and remove c++ layer overhead). In 
practice though - the most important c++ applications tend to be too 
huge and end up hitting bugs. They will also typically be very very long 
link times. (I've seen 30+ minutes - system specs and all things being 
relative of course..)

Go ping the gentoo-science guys and get their feedback - they may have 
the most experience with this...
----------------------

Not to be a smart-ass, but will someone start a thread on global PGO 
(profile guided optimizations) next? imho it would be interesting and 
great to have some general training data already contributed next to the 
ebuilds. For the science stuff I wouldn't recommend it, but who knows..




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-26 19:58             ` Martin Vaeth
  2014-04-27  0:34               ` "C. Bergström"
@ 2014-04-27  0:49               ` Rich Freeman
  1 sibling, 0 replies; 32+ messages in thread
From: Rich Freeman @ 2014-04-27  0:49 UTC (permalink / raw
  To: gentoo-dev

On Sat, Apr 26, 2014 at 3:58 PM, Martin Vaeth <martin@mvath.de> wrote:
> I have not always tested whether filtering -fwhole-program
> alone would be sufficient, but in many cases I did, and
> usually it was not sufficient.

Well, there is certainly something going on here, because...

> app-arch/bzip2 +flto*

This at least builds just fine for me with:
CFLAGS="-march=amdfam10 -Os -pipe -frename-registers -fweb
-freorder-blocks -freorder-blocks-and-partition -flto=5
-funit-at-a-time -ftree-pre -fgcse-sm -fgcse-las -fgcse-after-reload
-fmerge-all-constants -ftree-vectorize -ftree-parallelize-loops=4
-mabm -msse4a -fstack-protector"

It is possible that you might get different behavior on a different
version of gcc, or with some other combination of CFLAGS including the
ones you're using.

Rich


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-27  0:34               ` "C. Bergström"
@ 2014-04-27  2:14                 ` Alex Xu
  2014-04-27  2:37                   ` "C. Bergström"
  2014-04-27 22:57                 ` Joshua Kinard
  2014-04-28 21:08                 ` Andrew Savchenko
  2 siblings, 1 reply; 32+ messages in thread
From: Alex Xu @ 2014-04-27  2:14 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 241 bytes --]

On 26/04/14 08:34 PM, "C. Bergström" wrote:
> Pragmatically nobody gives a f* if grep has been optimized to the max
> since it's usually not the bottleneck.

http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-27  2:14                 ` Alex Xu
@ 2014-04-27  2:37                   ` "C. Bergström"
  2014-04-27 11:23                     ` Rich Freeman
  0 siblings, 1 reply; 32+ messages in thread
From: "C. Bergström" @ 2014-04-27  2:37 UTC (permalink / raw
  To: gentoo-dev; +Cc: Alex Xu

On 04/27/14 09:14 AM, Alex Xu wrote:
> On 26/04/14 08:34 PM, "C. Bergström" wrote:
>> Pragmatically nobody gives a f* if grep has been optimized to the max
>> since it's usually not the bottleneck.
> http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html
My point about grep + LTO still stands

#1 This isn't gentoo FreeBSD so it's probably irrelevant from the start 
- (the comparison is gnu vs bsd grep. Further - LTO won't save your butt 
from poor programming practices or magically turn things into efficient 
syscalls)

#2 The only reference to anything which the compiler could impact is
"Use Boyer-Moore (and unroll its inner loop a few times)." Finding out 
which flag controls that for ${CC} would have some importance. It's 
almost certainly combined with -O3 and or some standalone loop related 
optimization. (Nothing depending on LTO). If they were really clever or 
determined  - there's probably a few GCC or other pragma which could 
give a hint about unrolling.
-----------
The color of my bikeshed is __________________



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-27  2:37                   ` "C. Bergström"
@ 2014-04-27 11:23                     ` Rich Freeman
  2014-04-27 11:41                       ` "C. Bergström"
                                         ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Rich Freeman @ 2014-04-27 11:23 UTC (permalink / raw
  To: gentoo-dev; +Cc: Alex Xu

On Sat, Apr 26, 2014 at 10:37 PM, "C. Bergström"
<cbergstrom@pathscale.com> wrote:
> #2 The only reference to anything which the compiler could impact is
> "Use Boyer-Moore (and unroll its inner loop a few times)." Finding out which
> flag controls that for ${CC} would have some importance. It's almost
> certainly combined with -O3 and or some standalone loop related
> optimization. (Nothing depending on LTO). If they were really clever or
> determined  - there's probably a few GCC or other pragma which could give a
> hint about unrolling.

So, I'll certainly agree that package-specific CFLAG tuning will
always be superior to just setting some flag at the system level and
walking away.

And yet, in the same paragraph you mention -O3, which is tantamount to
just setting a flag and walking away.  That turns on 14 things you
probably don't really need.

I run -flto at the system level since in my experience it only causes
problems with a handful of packages, and when it does provide a
benefit I get it.  For the most part it just means my compiles at 2AM
take longer, and a bit more RAM, neither of which are a concern.  If I
do run into a bug, that is just an opportunity to log it and
contribute (though to date I haven't been submitting -flto issues as
bugs as it is still a bit new).

I think LTO is becoming mainstream-enough that we should consider it
supported in the sense that packages should filter it if it is known
not to work.  We certainly do that with things like -O2/3/s if they
don't work.  However, it still should be considered a somewhat
experimental flag and enabling it will involve bumps.  Also, it will
always involve a RAM tradeoff, so there may be cases where it isn't
filtered because it does work just fine, but it won't work for your
system with 4GB of RAM (or 8, or 16 even).  If maintainers want to add
logic to test before building (as is sometimes done for /var/tmp with
very large packages) they are welcome to do so, but I think that is
going above-and-beyond.

Rich


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-27 11:23                     ` Rich Freeman
@ 2014-04-27 11:41                       ` "C. Bergström"
  2014-04-27 14:52                         ` Rich Freeman
  2014-04-28  8:25                         ` Martin Vaeth
  2014-04-27 22:56                       ` Joshua Kinard
  2014-04-28 21:46                       ` Andrew Savchenko
  2 siblings, 2 replies; 32+ messages in thread
From: "C. Bergström" @ 2014-04-27 11:41 UTC (permalink / raw
  To: gentoo-dev; +Cc: Rich Freeman, Alex Xu

On 04/27/14 06:23 PM, Rich Freeman wrote:
> On Sat, Apr 26, 2014 at 10:37 PM, "C. Bergström"
> <cbergstrom@pathscale.com> wrote:
>> #2 The only reference to anything which the compiler could impact is
>> "Use Boyer-Moore (and unroll its inner loop a few times)." Finding out which
>> flag controls that for ${CC} would have some importance. It's almost
>> certainly combined with -O3 and or some standalone loop related
>> optimization. (Nothing depending on LTO). If they were really clever or
>> determined  - there's probably a few GCC or other pragma which could give a
>> hint about unrolling.
> So, I'll certainly agree that package-specific CFLAG tuning will
> always be superior to just setting some flag at the system level and
> walking away.
>
> And yet, in the same paragraph you mention -O3, which is tantamount to
> just setting a flag and walking away.  That turns on 14 things you
> probably don't really need.
I was trying to give a simplified example... no need to nitpick my reply 
(Every compiler defines -O3 differently and even the flag to unroll 
loops and that threshold may be different.. ...)
>
> I run -flto at the system level since in my experience it only causes
> problems with a handful of packages, and when it does provide a
> benefit I get it.
Can you name a single package that you use which receives a measurable 
benefit from LTO? (Just asking)

I don't disagree about enabling it, filing bug reports or many other 
things. I'm just curious if you have any hard numbers... (You seem 
passionate and sorry if this seems like I'm putting you on the spot)

/*
Side note
IPA (aka whole program and LTO) is by far the hardest optimizations I've 
ever personally had to debug/engineer/tune in a compiler. Making it 
robust needs passionate users who file good reduced test cases. While 
for a single source you have creduce or delta - what options are there 
for automated reduction of whole program problems..
*/



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-27 11:41                       ` "C. Bergström"
@ 2014-04-27 14:52                         ` Rich Freeman
  2014-04-28  8:25                         ` Martin Vaeth
  1 sibling, 0 replies; 32+ messages in thread
From: Rich Freeman @ 2014-04-27 14:52 UTC (permalink / raw
  To: C. Bergström; +Cc: gentoo-dev, Alex Xu

On Sun, Apr 27, 2014 at 7:41 AM, "C. Bergström"
<cbergstrom@pathscale.com> wrote:
> On 04/27/14 06:23 PM, Rich Freeman wrote:
>> And yet, in the same paragraph you mention -O3, which is tantamount to
>> just setting a flag and walking away.  That turns on 14 things you
>> probably don't really need.
>
> I was trying to give a simplified example... no need to nitpick my reply
> (Every compiler defines -O3 differently and even the flag to unroll loops
> and that threshold may be different.. ...)

Sorry if it came across aggressively.  I was just pointing out that
the reason one sets CFLAGs generically is to avoid the trouble of
"optimizing the optimizer."  This always comes at a cost - I tend to
use -Os, but no doubt some packages would benefit from a different
global optimization, let alone specific optimizations.

That was just the point I wanted to make about LTO - I think it is of
general usefulness since it has the potential to help, and rarely
hurts.  The only problem with it is that the implementation is
immature.

>
> Can you name a single package that you use which receives a measurable
> benefit from LTO? (Just asking)

Alas, I cannot.  There are some general benchmarks out there, and they
seem to vary from little to no effect to significant.  More
CPU-intensive software seems the most likely to benefit.  No doubt the
benefits of LTO will improve as it matures.

Rich


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-27 11:23                     ` Rich Freeman
  2014-04-27 11:41                       ` "C. Bergström"
@ 2014-04-27 22:56                       ` Joshua Kinard
  2014-04-27 23:08                         ` Rich Freeman
  2014-04-28 21:46                       ` Andrew Savchenko
  2 siblings, 1 reply; 32+ messages in thread
From: Joshua Kinard @ 2014-04-27 22:56 UTC (permalink / raw
  To: gentoo-dev

On 04/27/2014 07:23, Rich Freeman wrote:
> On Sat, Apr 26, 2014 at 10:37 PM, "C. Bergström"
> <cbergstrom@pathscale.com> wrote:
>> #2 The only reference to anything which the compiler could impact is
>> "Use Boyer-Moore (and unroll its inner loop a few times)." Finding out which
>> flag controls that for ${CC} would have some importance. It's almost
>> certainly combined with -O3 and or some standalone loop related
>> optimization. (Nothing depending on LTO). If they were really clever or
>> determined  - there's probably a few GCC or other pragma which could give a
>> hint about unrolling.
> 
> So, I'll certainly agree that package-specific CFLAG tuning will
> always be superior to just setting some flag at the system level and
> walking away.
> 
> And yet, in the same paragraph you mention -O3, which is tantamount to
> just setting a flag and walking away.  That turns on 14 things you
> probably don't really need.
> 
> I run -flto at the system level since in my experience it only causes
> problems with a handful of packages, and when it does provide a
> benefit I get it.  For the most part it just means my compiles at 2AM
> take longer, and a bit more RAM, neither of which are a concern.  If I
> do run into a bug, that is just an opportunity to log it and
> contribute (though to date I haven't been submitting -flto issues as
> bugs as it is still a bit new).

My curiosity, as I have not attempted LTO yet on any machine, is what are
the RAM requirements?  Is it a hard limit, wherein the compiler simply fails
if there isn't enough RAM, or does it just start hitting swap real hard?
Those of us using older archs where the RAM is limited might have to be more
cautious w/ LTO.  I.e., my SGI O2 maxes right now at 512MB.  It can go to
1GB if the odd memory/PROM issue is ever worked out.  But 512MB is it for
now, so what are my odds of successfully using LTO on that?

Especially if LTO helps to reduce the final binary size, that's less data
being shuffled around main memory and the CPU caches, which, although means
slower compile times, might hake such a machine a bit snippier.  Though, I
dread how long GCC will take to build itself w/ LTO.  The O2 already needs
~18hrs for 4.8.  I haven't tried 4.9 on it yet.

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
4096R/D25D95E3 2011-03-28

"The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-27  0:34               ` "C. Bergström"
  2014-04-27  2:14                 ` Alex Xu
@ 2014-04-27 22:57                 ` Joshua Kinard
  2014-04-28 21:08                 ` Andrew Savchenko
  2 siblings, 0 replies; 32+ messages in thread
From: Joshua Kinard @ 2014-04-27 22:57 UTC (permalink / raw
  To: gentoo-dev

On 04/26/2014 20:34, "C. Bergström" wrote:
> On 04/27/14 02:58 AM, Martin Vaeth wrote:
>> Rich Freeman <rich0@gentoo.org> wrote:
>>> FWIW the list of packages I have issues with include:
>> Not sure whether this is the right place to post it.
> It's interesting to see that rather lengthy list. From a compiler engineer
> perspective I'd like to toss in my opinion
[snip]

What compiler, out of curiosity?

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
4096R/D25D95E3 2011-03-28

"The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-27 22:56                       ` Joshua Kinard
@ 2014-04-27 23:08                         ` Rich Freeman
  2014-04-27 23:14                           ` Joshua Kinard
  0 siblings, 1 reply; 32+ messages in thread
From: Rich Freeman @ 2014-04-27 23:08 UTC (permalink / raw
  To: gentoo-dev

On Sun, Apr 27, 2014 at 6:56 PM, Joshua Kinard <kumba@gentoo.org> wrote:
>
> My curiosity, as I have not attempted LTO yet on any machine, is what are
> the RAM requirements?  Is it a hard limit, wherein the compiler simply fails
> if there isn't enough RAM, or does it just start hitting swap real hard?

It just allocates RAM, and the OS does the rest.  I've seen it invoke
the OOM killer.  That was back when I only had 8GB of RAM.  Now I have
16GB and I only need to disable LTO on the really big packages.

Of course, if you set an appropriate ulimit then the process will just
terminate more gracefully.  I'd highly recommend doing just that if
you have a lot of swap available.

> Those of us using older archs where the RAM is limited might have to be more
> cautious w/ LTO.  I.e., my SGI O2 maxes right now at 512MB.  It can go to
> 1GB if the odd memory/PROM issue is ever worked out.  But 512MB is it for
> now, so what are my odds of successfully using LTO on that?

About zero.  Well, I'm sure it will work fine for hello.c, especially
if you eliminate any function calls inside of it.

>
> Especially if LTO helps to reduce the final binary size, that's less data
> being shuffled around main memory and the CPU caches, which, although means
> slower compile times, might hake such a machine a bit snippier.  Though, I
> dread how long GCC will take to build itself w/ LTO.  The O2 already needs
> ~18hrs for 4.8.  I haven't tried 4.9 on it yet.

Yeah, good luck with that...  :)

I'd be curious as to what you find.  You can always try it out by
picking a small package and doing a CFLAGS=foo emerge bar.  Be sure to
only use -j1 -flto=1 as well.

Rich


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-27 23:08                         ` Rich Freeman
@ 2014-04-27 23:14                           ` Joshua Kinard
  2014-04-28  0:40                             ` "C. Bergström"
  0 siblings, 1 reply; 32+ messages in thread
From: Joshua Kinard @ 2014-04-27 23:14 UTC (permalink / raw
  To: gentoo-dev

On 04/27/2014 19:08, Rich Freeman wrote:
> On Sun, Apr 27, 2014 at 6:56 PM, Joshua Kinard <kumba@gentoo.org> wrote:
>>
>> My curiosity, as I have not attempted LTO yet on any machine, is what are
>> the RAM requirements?  Is it a hard limit, wherein the compiler simply fails
>> if there isn't enough RAM, or does it just start hitting swap real hard?
> 
> It just allocates RAM, and the OS does the rest.  I've seen it invoke
> the OOM killer.  That was back when I only had 8GB of RAM.  Now I have
> 16GB and I only need to disable LTO on the really big packages.
> 
> Of course, if you set an appropriate ulimit then the process will just
> terminate more gracefully.  I'd highly recommend doing just that if
> you have a lot of swap available.

My favourite, starting long compiles on slow boxen, only to wake up to
discover they failed in the final five minutes of the build over something
as trite as low memory :)


>> Those of us using older archs where the RAM is limited might have to be more
>> cautious w/ LTO.  I.e., my SGI O2 maxes right now at 512MB.  It can go to
>> 1GB if the odd memory/PROM issue is ever worked out.  But 512MB is it for
>> now, so what are my odds of successfully using LTO on that?
> 
> About zero.  Well, I'm sure it will work fine for hello.c, especially
> if you eliminate any function calls inside of it.

About zero?  So, some floating point value infinitely between 0 and 1?  Hmm,
maybe I'll try it once I get my SGI Octane to boot Linux again.


>>
>> Especially if LTO helps to reduce the final binary size, that's less data
>> being shuffled around main memory and the CPU caches, which, although means
>> slower compile times, might hake such a machine a bit snippier.  Though, I
>> dread how long GCC will take to build itself w/ LTO.  The O2 already needs
>> ~18hrs for 4.8.  I haven't tried 4.9 on it yet.
> 
> Yeah, good luck with that...  :)
> 
> I'd be curious as to what you find.  You can always try it out by
> picking a small package and doing a CFLAGS=foo emerge bar.  Be sure to
> only use -j1 -flto=1 as well.

O2 only has one CPU, so it's always -j1.  SMP on my other MIPS machines
doesn't work yet (either Linux isn't supported, or I haven't debugged SMP
code yet).

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
4096R/D25D95E3 2011-03-28

"The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-27 23:14                           ` Joshua Kinard
@ 2014-04-28  0:40                             ` "C. Bergström"
  2014-04-28  3:54                               ` Joshua Kinard
  2014-04-28  4:49                               ` Richard Yao
  0 siblings, 2 replies; 32+ messages in thread
From: "C. Bergström" @ 2014-04-28  0:40 UTC (permalink / raw
  To: gentoo-dev; +Cc: Joshua Kinard

On 04/28/14 06:14 AM, Joshua Kinard wrote:
> On 04/27/2014 19:08, Rich Freeman wrote:
>> On Sun, Apr 27, 2014 at 6:56 PM, Joshua Kinard <kumba@gentoo.org> wrote:
>>> My curiosity, as I have not attempted LTO yet on any machine, is what are
>>> the RAM requirements?  Is it a hard limit, wherein the compiler simply fails
>>> if there isn't enough RAM, or does it just start hitting swap real hard?
>> It just allocates RAM, and the OS does the rest.  I've seen it invoke
>> the OOM killer.  That was back when I only had 8GB of RAM.  Now I have
>> 16GB and I only need to disable LTO on the really big packages.
>>
>> Of course, if you set an appropriate ulimit then the process will just
>> terminate more gracefully.  I'd highly recommend doing just that if
>> you have a lot of swap available.
> My favourite, starting long compiles on slow boxen, only to wake up to
> discover they failed in the final five minutes of the build over something
> as trite as low memory :)
>
>
>>> Those of us using older archs where the RAM is limited might have to be more
>>> cautious w/ LTO.  I.e., my SGI O2 maxes right now at 512MB.  It can go to
>>> 1GB if the odd memory/PROM issue is ever worked out.  But 512MB is it for
>>> now, so what are my odds of successfully using LTO on that?
>> About zero.  Well, I'm sure it will work fine for hello.c, especially
>> if you eliminate any function calls inside of it.
> About zero?  So, some floating point value infinitely between 0 and 1?  Hmm,
> maybe I'll try it once I get my SGI Octane to boot Linux again.
>
>
>>> Especially if LTO helps to reduce the final binary size, that's less data
>>> being shuffled around main memory and the CPU caches, which, although means
>>> slower compile times, might hake such a machine a bit snippier.  Though, I
>>> dread how long GCC will take to build itself w/ LTO.  The O2 already needs
>>> ~18hrs for 4.8.  I haven't tried 4.9 on it yet.
>> Yeah, good luck with that...  :)
>>
>> I'd be curious as to what you find.  You can always try it out by
>> picking a small package and doing a CFLAGS=foo emerge bar.  Be sure to
>> only use -j1 -flto=1 as well.
> O2 only has one CPU, so it's always -j1.  SMP on my other MIPS machines
> doesn't work yet (either Linux isn't supported, or I haven't debugged SMP
> code yet).
On those old SGI MIPS machines use MIPSPro. It had better (LTO/whole 
program) optimizations than GCC more than 10 years ago (imho and gcc may 
have caught up now in 4.9). Just add the -ipa flag and test. In fairness 
there is primarily 3 limitations with MIPSPro IPA

1) It set a rather low (by modern standards) cutoff for when IPA 
wouldn't be really turned on (like 1MLOC or something)
2) GCC and others will do IPA on a single file (Module) - whereas 
MIPSPro required whole program to do some similar optimizations
3) MIPSPro is/was also a very very slow compiler (I do not recommend 
this for large c++ codes)

Most "sane" compilers who are dealing with IPA will have some cutoff in 
order to avoids insane levels of memory usage. The compiler will bail 
internally on the problem, but the compilation won't fail. It's also in 
theory possible to segment the problem and work on chunks of it. This 
would also in theory move to longer compile times, but lower memory 
constraints. "We" aren't doing this yet and I don't know anyone who is, 
but I'm possibly just uninformed.

In terms of general performance gains using LTO - The #1 candidate would 
be the linux kernel actually. See if anyone can get that to work ;)

While this thread is fun - I should exit() here since it doesn't seem 
productive to discuss further..

Thanks





^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-28  0:40                             ` "C. Bergström"
@ 2014-04-28  3:54                               ` Joshua Kinard
  2014-04-28  4:49                               ` Richard Yao
  1 sibling, 0 replies; 32+ messages in thread
From: Joshua Kinard @ 2014-04-28  3:54 UTC (permalink / raw
  To: gentoo-dev

On 04/27/2014 20:40, "C. Bergström" wrote:

> On those old SGI MIPS machines use MIPSPro. It had better (LTO/whole
> program) optimizations than GCC more than 10 years ago (imho and gcc may
> have caught up now in 4.9). Just add the -ipa flag and test. In fairness
> there is primarily 3 limitations with MIPSPro IPA

[snip]

That's if they ran IRIX.  They run Linux :)

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
4096R/D25D95E3 2011-03-28

"The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-28  0:40                             ` "C. Bergström"
  2014-04-28  3:54                               ` Joshua Kinard
@ 2014-04-28  4:49                               ` Richard Yao
  1 sibling, 0 replies; 32+ messages in thread
From: Richard Yao @ 2014-04-28  4:49 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 312 bytes --]

On Sun 27 Apr 2014 08:40:08 PM EDT, "C. Bergström" wrote:
> In terms of general performance gains using LTO - The #1 candidate
> would be the linux kernel actually. See if anyone can get that to work ;)

Intel's Andi Kleen is working on it:

http://lkml.iu.edu/hypermail/linux/kernel/1404.0/03450.html


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [gentoo-dev] Re: LTO use in the tree
  2014-04-27 11:41                       ` "C. Bergström"
  2014-04-27 14:52                         ` Rich Freeman
@ 2014-04-28  8:25                         ` Martin Vaeth
  2014-04-28  8:53                           ` Tomáš Pružina
  1 sibling, 1 reply; 32+ messages in thread
From: Martin Vaeth @ 2014-04-28  8:25 UTC (permalink / raw
  To: gentoo-dev

C. Bergström <cbergstrom@pathscale.com> wrote:
> Can you name a single package that you use which receives a measurable
> benefit from LTO? (Just asking)

Like for every optimization flag, it is easy to construct particular
examples: It can help a lot if e.g. a user's string-helper library
is inlined. Concerning memory, it can help a lot if duplicate data
(e.g. macros containing paths) from different compilation units
can be merged.
I guess (though I did no benchmarks) this is why eix profits so much
from LTO: it was already mentioned that eix's size is *considerably*
smaller with LTO.
Surprisingly, eix does almost not profit from clang's LTO.
I guess it is not the different implementation of LTO but of
the remaining optimizers which make the difference here.
Again: These are just guesses, I never tried to analyze.

I use it globally, because LTO *can* help a lot and should never
hurt performance if the remaining optimizers are good and
should not cause any issues (provided compilation goes through).
The price is clear: More than doubled compilation time (which takes
place in the linking phase and thus cannot be ccache'd) and for some
packages insane memory requirements which forbid its usage on some
systems.

> IPA (aka whole program and LTO) is by far the hardest optimizations
> I've ever personally had to debug/engineer/tune in a compiler.
> Making it robust [...

I guess it is not a problem by itself: It just triggers cases
which "in practice" do not occur otherwise, since most developer's
will typically write relatively small compilation units.
So you just now you see the bugs hidden in the algorithms which
before never were found...
OTOH, there are already projects like sqlite which have essentially
only one compilation unit, anyway. (I am guessing this only from
the output shown during compilation, so I might be wrong.)



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-28  8:25                         ` Martin Vaeth
@ 2014-04-28  8:53                           ` Tomáš Pružina
  0 siblings, 0 replies; 32+ messages in thread
From: Tomáš Pružina @ 2014-04-28  8:53 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 2876 bytes --]

>should not cause any issues (provided compilation goes through).

There are few packages which compile fine but break something (I remember
some x11-library from bugzilla that broke xorg-server), but generally I
agree with you.
One annoying package is 64bit firefox, which can easily eat up to 15GB of
memory (!!!), at least with gcc 4.8,
newer 4.9 branches are said to have fixed this, but since it required
complete rework of LTO, new bugs were inevitably introduced.

> OTOH, there are already projects like sqlite which have essentially only
one compilation unit, anyway.

Thats absolutely correct, there is one sqlite.c file which is split into
logical parts for easier code hacking, but it's one file.
Interestingly, even sqlite seems to be benefiting from LTO, binary is 5%
smaller on my system.


On Mon, Apr 28, 2014 at 10:25 AM, Martin Vaeth <martin@mvath.de> wrote:

> C. Bergström <cbergstrom@pathscale.com> wrote:
> > Can you name a single package that you use which receives a measurable
> > benefit from LTO? (Just asking)
>
> Like for every optimization flag, it is easy to construct particular
> examples: It can help a lot if e.g. a user's string-helper library
> is inlined. Concerning memory, it can help a lot if duplicate data
> (e.g. macros containing paths) from different compilation units
> can be merged.
> I guess (though I did no benchmarks) this is why eix profits so much
> from LTO: it was already mentioned that eix's size is *considerably*
> smaller with LTO.
> Surprisingly, eix does almost not profit from clang's LTO.
> I guess it is not the different implementation of LTO but of
> the remaining optimizers which make the difference here.
> Again: These are just guesses, I never tried to analyze.
>
> I use it globally, because LTO *can* help a lot and should never
> hurt performance if the remaining optimizers are good and
> should not cause any issues (provided compilation goes through).
> The price is clear: More than doubled compilation time (which takes
> place in the linking phase and thus cannot be ccache'd) and for some
> packages insane memory requirements which forbid its usage on some
> systems.
>
> > IPA (aka whole program and LTO) is by far the hardest optimizations
> > I've ever personally had to debug/engineer/tune in a compiler.
> > Making it robust [...
>
> I guess it is not a problem by itself: It just triggers cases
> which "in practice" do not occur otherwise, since most developer's
> will typically write relatively small compilation units.
> So you just now you see the bugs hidden in the algorithms which
> before never were found...
> OTOH, there are already projects like sqlite which have essentially
> only one compilation unit, anyway. (I am guessing this only from
> the output shown during compilation, so I might be wrong.)
>
>
>

[-- Attachment #2: Type: text/html, Size: 3464 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-27  0:34               ` "C. Bergström"
  2014-04-27  2:14                 ` Alex Xu
  2014-04-27 22:57                 ` Joshua Kinard
@ 2014-04-28 21:08                 ` Andrew Savchenko
  2 siblings, 0 replies; 32+ messages in thread
From: Andrew Savchenko @ 2014-04-28 21:08 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1588 bytes --]

Hello,

On Sun, 27 Apr 2014 07:34:05 +0700 C. Bergström wrote:
[...]
> Not to be a smart-ass, but will someone start a thread on global
> PGO (profile guided optimizations) next? imho it would be
> interesting and great to have some general training data already
> contributed next to the ebuilds. For the science stuff I wouldn't
> recommend it, but who knows..

Global PGO is meaningless, because PGO requires not just compiler
flags, but package-specific tests covering all widely used profiles
for package in question. So this requires intensive upstream work
and in no way can be done in Gentoo for any significant number of
packages.

At this moment only two packages in tree support PGO: dev-libs/gmp
and www-client/firefox. For gmp it works great. For firefox it is a
menace:
1) with current in-tree firefox versions PGO can't be used on x86
at all, since linker doesn't fit in 3GB memory limit, even with
memory-constraint options, both GNU ld and gold.
2) on amd64 4GB is surely not enough for linking of profile-enabled
version, so I can't use it here too.
3) Old firefox versions (somewhere around 18) were successfully
compiled on the same ~x86 and ~amd64 boxes. So something in
firefox tree changed that much.

There is also sci-libs/atlas in the science overlay which uses
similar technique during build. But strictly speaking this is not
PGO, as changes are made on algorithm level rather than on
compiler's one: it test each block with different parameters and
choses the fastest ones for current box.

Best regards,
Andrew Savchenko

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-27 11:23                     ` Rich Freeman
  2014-04-27 11:41                       ` "C. Bergström"
  2014-04-27 22:56                       ` Joshua Kinard
@ 2014-04-28 21:46                       ` Andrew Savchenko
  2014-04-28 23:45                         ` Rich Freeman
  2 siblings, 1 reply; 32+ messages in thread
From: Andrew Savchenko @ 2014-04-28 21:46 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1400 bytes --]

Hello,

On Sun, 27 Apr 2014 07:23:11 -0400 Rich Freeman wrote:
> And yet, in the same paragraph you mention -O3, which is
> tantamount to just setting a flag and walking away.  That turns
> on 14 things you probably don't really need.

Why 14 things? According to gcc-4.8.2 manual -O3 enables the
following:
-finline-functions, -funswitch-loops, -fpredictive-commoning,
-fgcse-after-reload, -ftree-vectorize, -fvect-cost-model,
-ftree-partial-pre, -fipa-cp-clone.
Some of this options triggers another ones, but these 8 things are
sufficient to mimic -O3 completely.

From my experience only three of them are harmful:
-finline-functions and -fipa-cp-clone bloat code size significantly
hurting performance due to more CPU cache misses.
-ftree-vectorize may be used on amd64 (performance boost is in the
range -3.. +5%), but is a complete menace on x86: a lot of ICEs and
a lot of segfaults due to stack misalignment and even some working
but miscompiled code. While some (but not all) stack alignment
issues may be fixed with -mstackrealign, this drops performance
enhancement to negative values.

All other -O3 option have either no effect or measurable
performance enhancements in the range of several percent.

Tests were made using multimedia packages (mplayer, ffmpeg, x264)
and scientific ones (root, pythia, geant, blas libs).

Best regards,
Andrew Savchenko

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] Re: LTO use in the tree
  2014-04-28 21:46                       ` Andrew Savchenko
@ 2014-04-28 23:45                         ` Rich Freeman
  0 siblings, 0 replies; 32+ messages in thread
From: Rich Freeman @ 2014-04-28 23:45 UTC (permalink / raw
  To: gentoo-dev

On Mon, Apr 28, 2014 at 5:46 PM, Andrew Savchenko <bircoph@gmail.com> wrote:
> Hello,
>
> On Sun, 27 Apr 2014 07:23:11 -0400 Rich Freeman wrote:
>> And yet, in the same paragraph you mention -O3, which is
>> tantamount to just setting a flag and walking away.  That turns
>> on 14 things you probably don't really need.
>
> Why 14 things? ...
>
> From my experience only three of them are harmful:
...
> All other -O3 option have either no effect or measurable
> performance enhancements in the range of several percent.

You missed my point.  I think running batch optimizations like -O2/3
only makes sense.  The argument was that -flto doesn't always help,
and thus shouldn't always be used.  My point was that convenience
options like -O2/3 were used because while the options don't always
help, they usually do, and nobody wants to bother with micromanaging
them.

Personally I use -O2 or -Os with a few additional options that are
less space-expensive than full -O3, on the premise that cache and
memory conservation probably buys you more than avoiding some jumps.
But, short of profiling every package any selection is going to be a
suboptimal choice based on averages.

I wasn't trying to say that there was something wrong with -O3/2/etc.

Rich


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [gentoo-dev] Re: LTO use in the tree
  2014-04-22 18:10   ` Matt Turner
@ 2014-05-02 23:55     ` Ryan Hill
  0 siblings, 0 replies; 32+ messages in thread
From: Ryan Hill @ 2014-05-02 23:55 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 801 bytes --]

On Tue, 22 Apr 2014 11:10:19 -0700
Matt Turner <mattst88@gentoo.org> wrote:

> > One thing I forgot to mention - LTO can also have detrimental effect on
> > certain architectures.  On some (eg. ppc), performance can actually be
> > degraded due to increased register pressure.  On others like alpha it's
> > questionable if it'll even work at all...
> 
> Worked for me on alpha, at least for what I tried. It cut eix's binary
> from 2 to 1.3 MB as well.

Cool, thanks for the info.  I was going by the request we had back in 4.6 to
turn off LTO for alpha because an upstream developer mentioned it wasn't
expected to work.


-- 
Ryan Hill                        psn: dirtyepic_sk
   gcc-porting/toolchain/wxwidgets @ gentoo.org

47C3 6D62 4864 0E49 8E9E  7F92 ED38 BD49 957A 8463

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [gentoo-dev] Re: LTO use in the tree
  2014-04-22  8:45   ` Martin Vaeth
  2014-04-26 10:23     ` Michał Górny
@ 2014-05-03  0:24     ` Ryan Hill
  1 sibling, 0 replies; 32+ messages in thread
From: Ryan Hill @ 2014-05-03  0:24 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1105 bytes --]

On Tue, 22 Apr 2014 08:45:31 +0000 (UTC)
Martin Vaeth <martin@mvath.de> wrote:

> Ryan Hill <rhill@gentoo.org> wrote:
> >
> > One thing I forgot to mention - LTO can also have detrimental effect on
> > certain architectures.  On some (eg. ppc), performance can actually
> > be degraded due to increased register pressure.
> 
> If this really is the case it is not the problem of LTO but
> of the optimizer: If the optimizer really produces *worse*
> code when he *can* see the full program instead of only parts of it,
> something is severely broken in the optimizer. Only decreasing the
> possibilities of the optimizer by removing LTO would be the wrong way
> to "solve" this problem.

Yes, this is a problem caused by aggressive inlining, and is being worked on
upstream[1].  I meant that currently released versions exhibit this behaviour.


[1] see for example http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01098.html

-- 
Ryan Hill                        psn: dirtyepic_sk
   gcc-porting/toolchain/wxwidgets @ gentoo.org

47C3 6D62 4864 0E49 8E9E  7F92 ED38 BD49 957A 8463

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2014-05-03  0:25 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-21  3:14 [gentoo-dev] LTO use in the tree Ryan Hill
2014-04-21  4:02 ` [gentoo-dev] " Ryan Hill
2014-04-22  8:45   ` Martin Vaeth
2014-04-26 10:23     ` Michał Górny
2014-04-26 11:15       ` Rich Freeman
2014-04-26 15:00         ` Martin Vaeth
2014-04-26 16:34           ` Rich Freeman
2014-04-26 19:58             ` Martin Vaeth
2014-04-27  0:34               ` "C. Bergström"
2014-04-27  2:14                 ` Alex Xu
2014-04-27  2:37                   ` "C. Bergström"
2014-04-27 11:23                     ` Rich Freeman
2014-04-27 11:41                       ` "C. Bergström"
2014-04-27 14:52                         ` Rich Freeman
2014-04-28  8:25                         ` Martin Vaeth
2014-04-28  8:53                           ` Tomáš Pružina
2014-04-27 22:56                       ` Joshua Kinard
2014-04-27 23:08                         ` Rich Freeman
2014-04-27 23:14                           ` Joshua Kinard
2014-04-28  0:40                             ` "C. Bergström"
2014-04-28  3:54                               ` Joshua Kinard
2014-04-28  4:49                               ` Richard Yao
2014-04-28 21:46                       ` Andrew Savchenko
2014-04-28 23:45                         ` Rich Freeman
2014-04-27 22:57                 ` Joshua Kinard
2014-04-28 21:08                 ` Andrew Savchenko
2014-04-27  0:49               ` Rich Freeman
2014-04-26 14:35       ` Martin Vaeth
2014-05-03  0:24     ` Ryan Hill
2014-04-22 18:10   ` Matt Turner
2014-05-02 23:55     ` Ryan Hill
2014-04-21  6:53 ` [gentoo-dev] " Michał Górny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox