* [gentoo-dev] LTO use in the tree @ 2014-04-21 3:14 Ryan Hill 2014-04-21 4:02 ` [gentoo-dev] " Ryan Hill 2014-04-21 6:53 ` [gentoo-dev] " Michał Górny 0 siblings, 2 replies; 32+ messages in thread From: Ryan Hill @ 2014-04-21 3:14 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 1300 bytes --] Hey all, As more and more packages are starting to add LTO flags automatically through their build systems, I thought I'd point out a couple things: - LTO utterly destroys debug info. Flags like -g are incompatible with LTO. - LTO causes .GCC.command.line sections to be discarded, which means your package will always be QA flagged as ignoring CFLAGS. - LTO takes a _lot_ of memory. That memory is required on the host arch. Distcc doesn't help things here, because linking happens locally. Consider all the archs your package is built on, and if they all routinely have multiple GBs of memory installed. - LTO in 4.7 is still fairly buggy. There are no plans to fix it. 4.8 is better, but 4.9 moves to a different model, so bugs in 4.8 probably won't be fixed, especially regarding memory usage. - I'm happy to backport patches to fix LTO problems if they're available, but you'll generally have to do the legwork. And like I said, most aren't going to be backportable. Please take these things into consideration when deciding whether or not this feature is worth it. Thanks. -- Ryan Hill psn: dirtyepic_sk gcc-porting/toolchain/wxwidgets @ gentoo.org 47C3 6D62 4864 0E49 8E9E 7F92 ED38 BD49 957A 8463 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* [gentoo-dev] Re: LTO use in the tree 2014-04-21 3:14 [gentoo-dev] LTO use in the tree Ryan Hill @ 2014-04-21 4:02 ` Ryan Hill 2014-04-22 8:45 ` Martin Vaeth 2014-04-22 18:10 ` Matt Turner 2014-04-21 6:53 ` [gentoo-dev] " Michał Górny 1 sibling, 2 replies; 32+ messages in thread From: Ryan Hill @ 2014-04-21 4:02 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 1700 bytes --] On Sun, 20 Apr 2014 21:14:51 -0600 Ryan Hill <rhill@gentoo.org> wrote: > Hey all, > > As more and more packages are starting to add LTO flags automatically through > their build systems, I thought I'd point out a couple things: > > - LTO utterly destroys debug info. Flags like -g are incompatible with LTO. > > - LTO causes .GCC.command.line sections to be discarded, which means your > package will always be QA flagged as ignoring CFLAGS. > > - LTO takes a _lot_ of memory. That memory is required on the host arch. > Distcc doesn't help things here, because linking happens locally. Consider > all the archs your package is built on, and if they all routinely have > multiple GBs of memory installed. > > - LTO in 4.7 is still fairly buggy. There are no plans to fix it. 4.8 is > better, but 4.9 moves to a different model, so bugs in 4.8 probably won't be > fixed, especially regarding memory usage. > > - I'm happy to backport patches to fix LTO problems if they're available, but > you'll generally have to do the legwork. And like I said, most aren't going > to be backportable. > > Please take these things into consideration when deciding whether or not this > feature is worth it. > > Thanks. > > One thing I forgot to mention - LTO can also have detrimental effect on certain architectures. On some (eg. ppc), performance can actually be degraded due to increased register pressure. On others like alpha it's questionable if it'll even work at all... -- Ryan Hill psn: dirtyepic_sk gcc-porting/toolchain/wxwidgets @ gentoo.org 47C3 6D62 4864 0E49 8E9E 7F92 ED38 BD49 957A 8463 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* [gentoo-dev] Re: LTO use in the tree 2014-04-21 4:02 ` [gentoo-dev] " Ryan Hill @ 2014-04-22 8:45 ` Martin Vaeth 2014-04-26 10:23 ` Michał Górny 2014-05-03 0:24 ` Ryan Hill 2014-04-22 18:10 ` Matt Turner 1 sibling, 2 replies; 32+ messages in thread From: Martin Vaeth @ 2014-04-22 8:45 UTC (permalink / raw To: gentoo-dev Ryan Hill <rhill@gentoo.org> wrote: > > One thing I forgot to mention - LTO can also have detrimental effect on > certain architectures. On some (eg. ppc), performance can actually > be degraded due to increased register pressure. If this really is the case it is not the problem of LTO but of the optimizer: If the optimizer really produces *worse* code when he *can* see the full program instead of only parts of it, something is severely broken in the optimizer. Only decreasing the possibilities of the optimizer by removing LTO would be the wrong way to "solve" this problem. Of course, this does not touch the validity of your other arguments. On the other hand, if upstream tests and supports LTO, it should be communicated to the user somehow that this is the case. The same dilemma applies to some other CFLAGS which should not be used in general but only if the code is written for them: Is it really a good idea to produce in such cases *by default* code which is less optimal than supported by upstream and the user is not even informed about this change? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-22 8:45 ` Martin Vaeth @ 2014-04-26 10:23 ` Michał Górny 2014-04-26 11:15 ` Rich Freeman 2014-04-26 14:35 ` Martin Vaeth 2014-05-03 0:24 ` Ryan Hill 1 sibling, 2 replies; 32+ messages in thread From: Michał Górny @ 2014-04-26 10:23 UTC (permalink / raw To: gentoo-dev; +Cc: martin [-- Attachment #1: Type: text/plain, Size: 1097 bytes --] Dnia 2014-04-22, o godz. 08:45:31 Martin Vaeth <martin@mvath.de> napisał(a): > On the other hand, if upstream tests and supports LTO, it should > be communicated to the user somehow that this is the case. > The same dilemma applies to some other CFLAGS which should not be > used in general but only if the code is written for them. Why do you believe that LTO 'should not be used in general'? As far as I understand, the LTO concept is suited well for most programs, though the results can vary. I agree that in the early stage many packages may be unhappy about it but as far as I understand, once it is more widespread only a few corner cases would be unsuited for LTO (+ the usual limitations like memory). That being the case, I'd feel it be more correct for LTO to disabled by default and enabled via CFLAGS+LDFLAGS, with packages not supporting LTO using flag-o-matic to filter them out. Although I should note that my understanding of LTO is pretty much limited to clang's angle. I don't know if gcc doesn't behave different. -- Best regards, Michał Górny [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 966 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-26 10:23 ` Michał Górny @ 2014-04-26 11:15 ` Rich Freeman 2014-04-26 15:00 ` Martin Vaeth 2014-04-26 14:35 ` Martin Vaeth 1 sibling, 1 reply; 32+ messages in thread From: Rich Freeman @ 2014-04-26 11:15 UTC (permalink / raw To: gentoo-dev; +Cc: martin On Sat, Apr 26, 2014 at 6:23 AM, Michał Górny <mgorny@gentoo.org> wrote: > > As far as I understand, the LTO concept is suited well for most > programs, though the results can vary. I agree that in the early stage > many packages may be unhappy about it but as far as I understand, once > it is more widespread only a few corner cases would be unsuited for LTO > (+ the usual limitations like memory). I tend to agree. I've been using stable gcc with -flto in my CFLAGS for a while now with only isolated problems. When I run into a problem, I disable it for that package alone. So far I've only done it for 26 packages. I wouldn't be surprised if some of them now work. I wouldn't really put LTO in the same category as fast-math. Anybody who is using it should be prepared to run into the odd breakage. It does make sense to filter the flag when it is known to not work. Rich ^ permalink raw reply [flat|nested] 32+ messages in thread
* [gentoo-dev] Re: LTO use in the tree 2014-04-26 11:15 ` Rich Freeman @ 2014-04-26 15:00 ` Martin Vaeth 2014-04-26 16:34 ` Rich Freeman 0 siblings, 1 reply; 32+ messages in thread From: Martin Vaeth @ 2014-04-26 15:00 UTC (permalink / raw To: gentoo-dev Rich Freeman <rich0@gentoo.org> wrote: > > I tend to agree. I've been using stable gcc with -flto in my CFLAGS > for a while now with only isolated problems. I wouldn't call these problems isolated: My current exception file has 340 lines, some of them containing wildcards, and it has a tendency to grow. For comparison: I have ~1400 packages installed. Maybe our milage varies because I have many multimedia packets which use lots of libraries - these usually break. > I wouldn't be surprised if some of them now work. Not much change between gcc versions up to 4.8: Some packages worked with newer versions some others broke instead. I have no experience yet with 4.9. > Anybody who is using it should be prepared to run into the odd > breakage. That's why it is wise that gentoo does not recommend to use LTO on a global scale. However, for packages which are tested by upstream with LTO...? > It does make sense to filter the flag when it is known to > not work. This would be the best solution of course: Recommend LTO and filter every occassion which breaks. But currently this is not realistic, because too many ebuilds would need to be tested and checked. Moreover, sometimes it depends on the gcc version whether filtering is necessary (although, as mentioned, these cases are relatively rare with <gcc-4.9). So, one should not expect this in any near future. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-26 15:00 ` Martin Vaeth @ 2014-04-26 16:34 ` Rich Freeman 2014-04-26 19:58 ` Martin Vaeth 0 siblings, 1 reply; 32+ messages in thread From: Rich Freeman @ 2014-04-26 16:34 UTC (permalink / raw To: gentoo-dev On Sat, Apr 26, 2014 at 11:00 AM, Martin Vaeth <martin@mvath.de> wrote: > Rich Freeman <rich0@gentoo.org> wrote: >> It does make sense to filter the flag when it is known to >> not work. > > This would be the best solution of course: Recommend LTO and > filter every occassion which breaks. But currently this is > not realistic, because too many ebuilds would need to be tested > and checked. Moreover, sometimes it depends on the gcc version > whether filtering is necessary (although, as mentioned, these > cases are relatively rare with <gcc-4.9). > So, one should not expect this in any near future. Sounds like many are already using it, and thus much of this testing has already been done. The results simply need to be collected. FWIW the list of packages I have issues with include: app-antivirus/clamav app-backup/bacula app-office/libreoffice dev-libs/elfutils dev-libs/libaio dev-libs/libpcre dev-python/pygtkglext dev-python/wxpython dev-qt/qt-creator dev-tex/luatex dev-util/kdevelop dev-util/kdevplatform kde-base/gwenview kde-base/kdelibs kde-base/korganizer media-gfx/inkscape media-sound/amarok media-video/mplayer net-libs/webkit-gtk net-misc/nx net-wireless/gnuradio sys-apps/kmod sys-devel/llvm sys-power/upower x11-libs/gtkglext x11-libs/wxGTK There are also other packages which I build with very simplified CFLAGS (just -O2 basically) which may or may not have LTO issues. Also, some packages above like libreoffice may very well build just fine if you have sufficient RAM. I do run a modest number of multimedia packages, and only the ones above have caused issue (though if any haven't been updated in a few years by some stroke of luck and being abandoned then they wouldn't be tested with LTO). Rich ^ permalink raw reply [flat|nested] 32+ messages in thread
* [gentoo-dev] Re: LTO use in the tree 2014-04-26 16:34 ` Rich Freeman @ 2014-04-26 19:58 ` Martin Vaeth 2014-04-27 0:34 ` "C. Bergström" 2014-04-27 0:49 ` Rich Freeman 0 siblings, 2 replies; 32+ messages in thread From: Martin Vaeth @ 2014-04-26 19:58 UTC (permalink / raw To: gentoo-dev Rich Freeman <rich0@gentoo.org> wrote: > > FWIW the list of packages I have issues with include: Not sure whether this is the right place to post it. Anyway, here is my filter file. I remember now that almost everything using cmake and related tools failed (e.g. practically all of kde-base, dev-qt). Meanwhile I have uninstalled kde and thus not tested for many months... "legend": +flto* is a shortcut for filtering all of -flto -fuse-linker-plugin -femit-llvm -fwhole-program +fwhole-program only filters -fwhole-program I have not always tested whether filtering -fwhole-program alone would be sufficient, but in many cases I did, and usually it was not sufficient. Note that language plugins (e.g. perl-core/* dev-python/*) usually failed if they contain C code, so for simplicity I filtered the whole category (although in some cases perhaps filtering -fwhole-program alone might be sufficient). *-libs/* +fwhole-program *-plugins/* +fwhole-program app-admin/sudo +flto* app-arch/bzip2 +flto* app-arch/cpio +flto* app-arch/p7zip +flto* app-arch/par2cmdline +flto* app-arch/sharutils +fwhole-program app-arch/star +flto* app-arch/tar +fwhole-program app-arch/unrar +flto* app-arch/unzip +fwhole-program app-arch/zpaq +fwhole-program app-backup/dar +flto* app-cdr/cdrtools +flto* app-cdr/k3b +flto* app-crypt/gnupg +fwhole-program app-crypt/pinentry +fwhole-program app-crypt/qca +flto* app-editors/jed +flto* app-editors/kile +flto* app-editors/vim +flto* app-emulation/bochs +flto* app-emulation/dosemu +flto* app-emulation/vice +flto* app-emulation/wine +flto* app-misc/lirc +fwhole-program app-misc/mc +fwhole-program app-misc/strigi +flto* app-misc/tmux +flto* app-office/calligra +flto* app-office/libreoffice +flto* app-shells/bash +flto* app-shells/zsh +flto* app-text/a2ps +fwhole-program app-text/aspell +flto* app-text/convertlit +fwhole-program app-text/djvu +flto* app-text/dvipsk +flto* app-text/ebook-tools +fwhole-program app-text/fbreader +flto* app-text/ghostscript-gpl +flto* app-text/gocr +fwhole-program app-text/hunspell +flto* app-text/mupdf +flto* app-text/poppler +flto* app-text/ps2pkm +flto* app-text/psutils +flto* app-text/rarian +flto* app-text/recode +fwhole-program app-text/sablotron +flto* app-text/stardict +flto* app-text/teckit +flto* app-text/texlive-core +flto* app-text/unpaper +flto* app-text/wdiff +fwhole-program app-text/qpdfview +fwhole-program app-text/xdvik +flto* app-text/zathura* +fwhole-program dev-cpp/atkmm +flto* dev-cpp/cairomm +flto* dev-cpp/clucene +flto* dev-cpp/glibmm +flto* dev-cpp/gtkmm +flto* dev-cpp/libxmlpp +flto* dev-cpp/pangomm +flto* dev-db/sqlite +flto* dev-games/flatzebra +flto* dev-java/icedtea* +flto* dev-lang/lua +flto* dev-lang/orc +flto* dev-lang/perl +flto* dev-lang/python +flto* dev-lang/ruby +flto* dev-lang/spidermonkey +flto* dev-lang/tcl +flto* dev-lang/tk +flto* dev-libs/boost +flto* dev-libs/dbus-glib +flto* dev-libs/elfutils +flto* dev-libs/glib +flto* dev-libs/gmp +flto* dev-libs/libcdio +flto* dev-libs/libpcre +flto* dev-libs/libsigc++ +flto* dev-libs/libusb +flto* dev-libs/nspr +flto* dev-libs/openssl +flto* dev-libs/ppl +flto* dev-libs/rlog +flto* dev-libs/skalibs +flto* dev-libs/xerces-c +flto* dev-libs/zziplib +flto* dev-lisp/clisp +flto* dev-perl/* +flto* dev-python/* +flto* dev-qt/qt* +flto* dev-scheme/guile +flto* dev-tcltk/expect +flto* dev-tex/luatex +flto* dev-util/android-tools +flto* dev-util/bdelta +flto* dev-util/cmake +flto* dev-util/dialog +flto* dev-util/ltrace +flto* dev-util/schroot +flto* dev-util/valgrind +fwhole-program dev-vcs/cvs +fwhole-program dev-vcs/git +flto* dev-vcs/mercurial +flto* dev-vcs/monotone +flto* dev-vcs/subversion +flto* games-action/gltron +flto* games-arcade/kobodeluxe +fwhole-program games-arcade/lbreakout +flto* games-arcade/rocksndiamonds +fwhole-program games-arcade/xgalaga +fwhole-program games-board/xboard +flto* games-emulation/advancemame +fwhole-program games-emulation/dosbox +flto* games-emulation/sdlmame +flto* games-emulation/xmame +flto* games-emulation/xmess +flto* games-engines/scummvm +flto* games-engines/scummvm-tools +flto* games-fps/doomsday +flto* games-fps/prboom +flto* games-puzzle/enigma +flto* games-rpg/freedroidrpg +flto* gnome-base/libglade +flto* kde-base/* +flto* mail-client/claws-mail +flto* mail-filter/maildrop +fwhole-program media-gfx/exiv2 +flto* media-gfx/gimp +flto* media-gfx/graphicsmagick +flto* media-gfx/graphite2 +flto* media-gfx/graphviz +flto* media-gfx/imagemagick +flto* media-gfx/pstoedit +flto* media-gfx/sam2p +flto* media-gfx/sane-backends +flto* media-gfx/transfig +flto* media-gfx/xv +flto* media-libs/alsa-lib +flto* media-libs/avidemux* +flto* media-libs/flac +flto* media-libs/freetype +flto* media-libs/giflib +flto* media-libs/gstreamer +flto* media-libs/jbigkit +flto* media-libs/jpeg +flto* media-libs/libcaca +flto* media-libs/libdvbpsi +flto* media-libs/libdvdnav +flto* media-libs/libdvdread +flto* media-libs/liblastfm +flto* media-libs/libmimic +flto* media-libs/libmodplug +flto* media-libs/libmp4v2 +flto* media-libs/libpng +flto* media-libs/libpostproc +flto* media-libs/libsidplay +flto* media-libs/libsndfile +flto* media-libs/libv4l +flto* media-libs/libvpx +flto* media-libs/mediastreamer +flto* media-libs/mesa +flto* media-libs/musicbrainz +flto* media-libs/netpbm +flto* media-libs/opencore-amr +flto* media-libs/openjpeg +flto* media-libs/phonon +flto* media-libs/plotutils +flto* media-libs/raptor +flto* media-libs/schroedinger +flto* media-libs/silgraphite +flto* media-libs/smpeg +flto* media-libs/t1lib +flto* media-libs/tiff +flto* media-libs/x264 +flto* media-libs/zvbi +flto* media-plugins/live +flto* media-sound/audacity +flto* media-sound/audex +flto* media-sound/cdparanoia +flto* media-sound/gsm +flto* media-sound/kradio +flto* media-sound/kstreamripper +flto* media-sound/lilypond +flto* media-sound/musepack-tools +fwhole-program media-sound/normalize +flto* media-sound/qmmp +fwhole-program media-sound/timidity++ +flto* media-sound/vorbis-tools +flto* media-sound/wavpack +flto* media-sound/xmms2 +flto* media-tv/kdetv +flto* media-tv/linuxtv-dvb-apps +flto* media-tv/v4l-utils +flto* media-tv/xawtv +flto* media-video/avidemux +flto* media-video/cclive +flto* media-video/dirac +flto* media-video/ffmpeg +flto* media-video/ffmpegthumbnailer +flto* media-video/gnome-mplayer +fwhole-program media-video/kaffeine +flto* media-video/libav +flto* media-video/mjpegtools +flto* media-video/mplayer* +flto* media-video/nvidia-settings +flto* media-video/rtmpdump +fwhole-program media-video/transcode +flto* media-video/vlc +flto* media-video/xine-ui +flto* net-analyzer/wireshark +flto* net-dialup/ppp +flto* net-dns/libidn +fwhole-program net-dns/pdnsd +flto* net-firewall/ipsec-tools +fwhole-program net-firewall/iptables +flto* net-fs/autofs +flto* net-ftp/lftp +flto* net-libs/gnutls +flto* net-libs/libetpan +flto* net-libs/libpcap +flto* net-libs/libsrtp +flto* net-libs/opal +flto* net-libs/ptlib +flto* net-libs/wvstreams +flto* net-mail/uw-mailutils +fwhole-program net-misc/curl +flto* net-misc/iputils +flto* net-misc/nx +flto* net-misc/nxcl +flto* net-misc/openssh +flto* net-misc/tor +flto* net-p2p/ktorrent +flto* net-print/cups +flto* net-print/foo2zjs +flto* net-voip/ekiga +flto* net-voip/yate +flto* net-wireless/wireless-tools +flto* perl-core/* +flto* sci-libs/cln +flto* sci-libs/gdal +flto* sci-libs/libgeotiff +flto* sci-libs/qrupdate +flto* sci-mathematics/axiom +flto* sci-mathematics/ginac +flto* sci-mathematics/glpk +flto* sci-mathematics/octave +flto* sci-mathematics/pari +flto* sci-mathematics/scilab +flto* sci-visualization/gnuplot +flto* sys-apps/busybox +flto* sys-apps/coreutils +flto* sys-apps/dbus +flto* sys-apps/fakeroot-ng +fwhole-program sys-apps/findutils +flto* sys-apps/gawk +flto* sys-apps/grep +fwhole-program sys-apps/groff +fwhole-program sys-apps/hdparm +flto* sys-apps/iproute2 +fwhole-program sys-apps/kmod +flto* sys-apps/less* +flto* sys-apps/openrc +flto* sys-apps/pciutils +flto* sys-apps/sandbox +flto* sys-apps/shadow +flto* sys-apps/sysvinit +flto* sys-apps/tcp-wrappers +fwhole-program sys-apps/util-linux +flto* sys-apps/which +flto* sys-auth/polkit +flto* sys-auth/polkit-qt +flto* sys-devel/bc +fwhole-program sys-devel/clang +flto* # -flto needs >3GB here sys-devel/flex +flto* sys-devel/gettext +flto* sys-devel/libtool +flto* sys-devel/llvm +flto* sys-fs/ddrescue +flto* sys-fs/dosfstools +flto* sys-fs/e2fsprogs +flto* sys-fs/encfs +flto* sys-fs/ext4magic +flto* sys-fs/lvm2 +flto* sys-fs/mtools +flto* sys-fs/squashfs-tools +flto* sys-fs/udev +flto* sys-fs/udftools +flto* sys-fs/udisks +flto* sys-fs/unionfs-fuse +flto* sys-kernel/kccmp +flto* sys-libs/e2fsprogs-libs +flto* sys-libs/glibc +flto* sys-libs/gpm +flto* sys-libs/libcap +flto* sys-libs/ncurses +flto* sys-libs/slang +flto* sys-libs/zlib +flto* sys-power/iasl +flto* sys-power/hibernate-script +fwhole-program sys-power/suspend +flto* sys-power/upower +flto* sys-process/lsof +fwhole-program sys-process/numactl +flto* sys-process/procps +flto* www-client/dillo +flto* www-client/firefox +flto* # -flto needs too much memory/time www-client/lynx +fwhole-program www-plugins/gnash +flto* www-plugins/mozplugger +flto* x11-apps/xrandr +flto* x11-base/xorg-server +flto* x11-drivers/xf86-video-intel +flto* x11-libs/fltk +flto* x11-libs/gdk-pixbuf +flto* x11-libs/gtkglext +flto* x11-libs/libX11 +flto* x11-libs/libXaw3d +flto* x11-libs/libvdpau +flto* x11-libs/libwnck +flto* x11-libs/motif +flto* x11-libs/pango +flto* x11-libs/wxGTK +flto* x11-misc/slim +flto* x11-misc/xfractint +flto* x11-misc/xscreensaver +flto* x11-wm/fvwm +flto* xfce-base/garcon +flto* xfce-base/libxfce4ui +flto* xfce-base/libxfce4util +flto* xfce-base/libxfcegui4 +flto* xfce-base/thunar +flto* xfce-base/xfconf +flto* ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-26 19:58 ` Martin Vaeth @ 2014-04-27 0:34 ` "C. Bergström" 2014-04-27 2:14 ` Alex Xu ` (2 more replies) 2014-04-27 0:49 ` Rich Freeman 1 sibling, 3 replies; 32+ messages in thread From: "C. Bergström" @ 2014-04-27 0:34 UTC (permalink / raw To: gentoo-dev; +Cc: Martin Vaeth On 04/27/14 02:58 AM, Martin Vaeth wrote: > Rich Freeman <rich0@gentoo.org> wrote: >> FWIW the list of packages I have issues with include: > Not sure whether this is the right place to post it. It's interesting to see that rather lengthy list. From a compiler engineer perspective I'd like to toss in my opinion --------------------- Compiler flags are typically meant to do one or two things. Improve performance or reduce binary (code size). Pragmatically nobody gives a f* if grep has been optimized to the max since it's usually not the bottleneck. Having LTO and whole program optimizations turned on for every package will probably not give you a noticeably faster system, but will certainly slow your build down. (Due to rather large link times) The packages which it really *should* be turned on for - anything which is computationally complex, Fortran and stuff where performance matters. I don't know gcc's LTO or what it's capable of, but in our compiler it would also potentially improve large c++ applications a lot. (It should help inline more aggressively and remove c++ layer overhead). In practice though - the most important c++ applications tend to be too huge and end up hitting bugs. They will also typically be very very long link times. (I've seen 30+ minutes - system specs and all things being relative of course..) Go ping the gentoo-science guys and get their feedback - they may have the most experience with this... ---------------------- Not to be a smart-ass, but will someone start a thread on global PGO (profile guided optimizations) next? imho it would be interesting and great to have some general training data already contributed next to the ebuilds. For the science stuff I wouldn't recommend it, but who knows.. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-27 0:34 ` "C. Bergström" @ 2014-04-27 2:14 ` Alex Xu 2014-04-27 2:37 ` "C. Bergström" 2014-04-27 22:57 ` Joshua Kinard 2014-04-28 21:08 ` Andrew Savchenko 2 siblings, 1 reply; 32+ messages in thread From: Alex Xu @ 2014-04-27 2:14 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 241 bytes --] On 26/04/14 08:34 PM, "C. Bergström" wrote: > Pragmatically nobody gives a f* if grep has been optimized to the max > since it's usually not the bottleneck. http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-27 2:14 ` Alex Xu @ 2014-04-27 2:37 ` "C. Bergström" 2014-04-27 11:23 ` Rich Freeman 0 siblings, 1 reply; 32+ messages in thread From: "C. Bergström" @ 2014-04-27 2:37 UTC (permalink / raw To: gentoo-dev; +Cc: Alex Xu On 04/27/14 09:14 AM, Alex Xu wrote: > On 26/04/14 08:34 PM, "C. Bergström" wrote: >> Pragmatically nobody gives a f* if grep has been optimized to the max >> since it's usually not the bottleneck. > http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html My point about grep + LTO still stands #1 This isn't gentoo FreeBSD so it's probably irrelevant from the start - (the comparison is gnu vs bsd grep. Further - LTO won't save your butt from poor programming practices or magically turn things into efficient syscalls) #2 The only reference to anything which the compiler could impact is "Use Boyer-Moore (and unroll its inner loop a few times)." Finding out which flag controls that for ${CC} would have some importance. It's almost certainly combined with -O3 and or some standalone loop related optimization. (Nothing depending on LTO). If they were really clever or determined - there's probably a few GCC or other pragma which could give a hint about unrolling. ----------- The color of my bikeshed is __________________ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-27 2:37 ` "C. Bergström" @ 2014-04-27 11:23 ` Rich Freeman 2014-04-27 11:41 ` "C. Bergström" ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Rich Freeman @ 2014-04-27 11:23 UTC (permalink / raw To: gentoo-dev; +Cc: Alex Xu On Sat, Apr 26, 2014 at 10:37 PM, "C. Bergström" <cbergstrom@pathscale.com> wrote: > #2 The only reference to anything which the compiler could impact is > "Use Boyer-Moore (and unroll its inner loop a few times)." Finding out which > flag controls that for ${CC} would have some importance. It's almost > certainly combined with -O3 and or some standalone loop related > optimization. (Nothing depending on LTO). If they were really clever or > determined - there's probably a few GCC or other pragma which could give a > hint about unrolling. So, I'll certainly agree that package-specific CFLAG tuning will always be superior to just setting some flag at the system level and walking away. And yet, in the same paragraph you mention -O3, which is tantamount to just setting a flag and walking away. That turns on 14 things you probably don't really need. I run -flto at the system level since in my experience it only causes problems with a handful of packages, and when it does provide a benefit I get it. For the most part it just means my compiles at 2AM take longer, and a bit more RAM, neither of which are a concern. If I do run into a bug, that is just an opportunity to log it and contribute (though to date I haven't been submitting -flto issues as bugs as it is still a bit new). I think LTO is becoming mainstream-enough that we should consider it supported in the sense that packages should filter it if it is known not to work. We certainly do that with things like -O2/3/s if they don't work. However, it still should be considered a somewhat experimental flag and enabling it will involve bumps. Also, it will always involve a RAM tradeoff, so there may be cases where it isn't filtered because it does work just fine, but it won't work for your system with 4GB of RAM (or 8, or 16 even). If maintainers want to add logic to test before building (as is sometimes done for /var/tmp with very large packages) they are welcome to do so, but I think that is going above-and-beyond. Rich ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-27 11:23 ` Rich Freeman @ 2014-04-27 11:41 ` "C. Bergström" 2014-04-27 14:52 ` Rich Freeman 2014-04-28 8:25 ` Martin Vaeth 2014-04-27 22:56 ` Joshua Kinard 2014-04-28 21:46 ` Andrew Savchenko 2 siblings, 2 replies; 32+ messages in thread From: "C. Bergström" @ 2014-04-27 11:41 UTC (permalink / raw To: gentoo-dev; +Cc: Rich Freeman, Alex Xu On 04/27/14 06:23 PM, Rich Freeman wrote: > On Sat, Apr 26, 2014 at 10:37 PM, "C. Bergström" > <cbergstrom@pathscale.com> wrote: >> #2 The only reference to anything which the compiler could impact is >> "Use Boyer-Moore (and unroll its inner loop a few times)." Finding out which >> flag controls that for ${CC} would have some importance. It's almost >> certainly combined with -O3 and or some standalone loop related >> optimization. (Nothing depending on LTO). If they were really clever or >> determined - there's probably a few GCC or other pragma which could give a >> hint about unrolling. > So, I'll certainly agree that package-specific CFLAG tuning will > always be superior to just setting some flag at the system level and > walking away. > > And yet, in the same paragraph you mention -O3, which is tantamount to > just setting a flag and walking away. That turns on 14 things you > probably don't really need. I was trying to give a simplified example... no need to nitpick my reply (Every compiler defines -O3 differently and even the flag to unroll loops and that threshold may be different.. ...) > > I run -flto at the system level since in my experience it only causes > problems with a handful of packages, and when it does provide a > benefit I get it. Can you name a single package that you use which receives a measurable benefit from LTO? (Just asking) I don't disagree about enabling it, filing bug reports or many other things. I'm just curious if you have any hard numbers... (You seem passionate and sorry if this seems like I'm putting you on the spot) /* Side note IPA (aka whole program and LTO) is by far the hardest optimizations I've ever personally had to debug/engineer/tune in a compiler. Making it robust needs passionate users who file good reduced test cases. While for a single source you have creduce or delta - what options are there for automated reduction of whole program problems.. */ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-27 11:41 ` "C. Bergström" @ 2014-04-27 14:52 ` Rich Freeman 2014-04-28 8:25 ` Martin Vaeth 1 sibling, 0 replies; 32+ messages in thread From: Rich Freeman @ 2014-04-27 14:52 UTC (permalink / raw To: C. Bergström; +Cc: gentoo-dev, Alex Xu On Sun, Apr 27, 2014 at 7:41 AM, "C. Bergström" <cbergstrom@pathscale.com> wrote: > On 04/27/14 06:23 PM, Rich Freeman wrote: >> And yet, in the same paragraph you mention -O3, which is tantamount to >> just setting a flag and walking away. That turns on 14 things you >> probably don't really need. > > I was trying to give a simplified example... no need to nitpick my reply > (Every compiler defines -O3 differently and even the flag to unroll loops > and that threshold may be different.. ...) Sorry if it came across aggressively. I was just pointing out that the reason one sets CFLAGs generically is to avoid the trouble of "optimizing the optimizer." This always comes at a cost - I tend to use -Os, but no doubt some packages would benefit from a different global optimization, let alone specific optimizations. That was just the point I wanted to make about LTO - I think it is of general usefulness since it has the potential to help, and rarely hurts. The only problem with it is that the implementation is immature. > > Can you name a single package that you use which receives a measurable > benefit from LTO? (Just asking) Alas, I cannot. There are some general benchmarks out there, and they seem to vary from little to no effect to significant. More CPU-intensive software seems the most likely to benefit. No doubt the benefits of LTO will improve as it matures. Rich ^ permalink raw reply [flat|nested] 32+ messages in thread
* [gentoo-dev] Re: LTO use in the tree 2014-04-27 11:41 ` "C. Bergström" 2014-04-27 14:52 ` Rich Freeman @ 2014-04-28 8:25 ` Martin Vaeth 2014-04-28 8:53 ` Tomáš Pružina 1 sibling, 1 reply; 32+ messages in thread From: Martin Vaeth @ 2014-04-28 8:25 UTC (permalink / raw To: gentoo-dev C. Bergström <cbergstrom@pathscale.com> wrote: > Can you name a single package that you use which receives a measurable > benefit from LTO? (Just asking) Like for every optimization flag, it is easy to construct particular examples: It can help a lot if e.g. a user's string-helper library is inlined. Concerning memory, it can help a lot if duplicate data (e.g. macros containing paths) from different compilation units can be merged. I guess (though I did no benchmarks) this is why eix profits so much from LTO: it was already mentioned that eix's size is *considerably* smaller with LTO. Surprisingly, eix does almost not profit from clang's LTO. I guess it is not the different implementation of LTO but of the remaining optimizers which make the difference here. Again: These are just guesses, I never tried to analyze. I use it globally, because LTO *can* help a lot and should never hurt performance if the remaining optimizers are good and should not cause any issues (provided compilation goes through). The price is clear: More than doubled compilation time (which takes place in the linking phase and thus cannot be ccache'd) and for some packages insane memory requirements which forbid its usage on some systems. > IPA (aka whole program and LTO) is by far the hardest optimizations > I've ever personally had to debug/engineer/tune in a compiler. > Making it robust [... I guess it is not a problem by itself: It just triggers cases which "in practice" do not occur otherwise, since most developer's will typically write relatively small compilation units. So you just now you see the bugs hidden in the algorithms which before never were found... OTOH, there are already projects like sqlite which have essentially only one compilation unit, anyway. (I am guessing this only from the output shown during compilation, so I might be wrong.) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-28 8:25 ` Martin Vaeth @ 2014-04-28 8:53 ` Tomáš Pružina 0 siblings, 0 replies; 32+ messages in thread From: Tomáš Pružina @ 2014-04-28 8:53 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 2876 bytes --] >should not cause any issues (provided compilation goes through). There are few packages which compile fine but break something (I remember some x11-library from bugzilla that broke xorg-server), but generally I agree with you. One annoying package is 64bit firefox, which can easily eat up to 15GB of memory (!!!), at least with gcc 4.8, newer 4.9 branches are said to have fixed this, but since it required complete rework of LTO, new bugs were inevitably introduced. > OTOH, there are already projects like sqlite which have essentially only one compilation unit, anyway. Thats absolutely correct, there is one sqlite.c file which is split into logical parts for easier code hacking, but it's one file. Interestingly, even sqlite seems to be benefiting from LTO, binary is 5% smaller on my system. On Mon, Apr 28, 2014 at 10:25 AM, Martin Vaeth <martin@mvath.de> wrote: > C. Bergström <cbergstrom@pathscale.com> wrote: > > Can you name a single package that you use which receives a measurable > > benefit from LTO? (Just asking) > > Like for every optimization flag, it is easy to construct particular > examples: It can help a lot if e.g. a user's string-helper library > is inlined. Concerning memory, it can help a lot if duplicate data > (e.g. macros containing paths) from different compilation units > can be merged. > I guess (though I did no benchmarks) this is why eix profits so much > from LTO: it was already mentioned that eix's size is *considerably* > smaller with LTO. > Surprisingly, eix does almost not profit from clang's LTO. > I guess it is not the different implementation of LTO but of > the remaining optimizers which make the difference here. > Again: These are just guesses, I never tried to analyze. > > I use it globally, because LTO *can* help a lot and should never > hurt performance if the remaining optimizers are good and > should not cause any issues (provided compilation goes through). > The price is clear: More than doubled compilation time (which takes > place in the linking phase and thus cannot be ccache'd) and for some > packages insane memory requirements which forbid its usage on some > systems. > > > IPA (aka whole program and LTO) is by far the hardest optimizations > > I've ever personally had to debug/engineer/tune in a compiler. > > Making it robust [... > > I guess it is not a problem by itself: It just triggers cases > which "in practice" do not occur otherwise, since most developer's > will typically write relatively small compilation units. > So you just now you see the bugs hidden in the algorithms which > before never were found... > OTOH, there are already projects like sqlite which have essentially > only one compilation unit, anyway. (I am guessing this only from > the output shown during compilation, so I might be wrong.) > > > [-- Attachment #2: Type: text/html, Size: 3464 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-27 11:23 ` Rich Freeman 2014-04-27 11:41 ` "C. Bergström" @ 2014-04-27 22:56 ` Joshua Kinard 2014-04-27 23:08 ` Rich Freeman 2014-04-28 21:46 ` Andrew Savchenko 2 siblings, 1 reply; 32+ messages in thread From: Joshua Kinard @ 2014-04-27 22:56 UTC (permalink / raw To: gentoo-dev On 04/27/2014 07:23, Rich Freeman wrote: > On Sat, Apr 26, 2014 at 10:37 PM, "C. Bergström" > <cbergstrom@pathscale.com> wrote: >> #2 The only reference to anything which the compiler could impact is >> "Use Boyer-Moore (and unroll its inner loop a few times)." Finding out which >> flag controls that for ${CC} would have some importance. It's almost >> certainly combined with -O3 and or some standalone loop related >> optimization. (Nothing depending on LTO). If they were really clever or >> determined - there's probably a few GCC or other pragma which could give a >> hint about unrolling. > > So, I'll certainly agree that package-specific CFLAG tuning will > always be superior to just setting some flag at the system level and > walking away. > > And yet, in the same paragraph you mention -O3, which is tantamount to > just setting a flag and walking away. That turns on 14 things you > probably don't really need. > > I run -flto at the system level since in my experience it only causes > problems with a handful of packages, and when it does provide a > benefit I get it. For the most part it just means my compiles at 2AM > take longer, and a bit more RAM, neither of which are a concern. If I > do run into a bug, that is just an opportunity to log it and > contribute (though to date I haven't been submitting -flto issues as > bugs as it is still a bit new). My curiosity, as I have not attempted LTO yet on any machine, is what are the RAM requirements? Is it a hard limit, wherein the compiler simply fails if there isn't enough RAM, or does it just start hitting swap real hard? Those of us using older archs where the RAM is limited might have to be more cautious w/ LTO. I.e., my SGI O2 maxes right now at 512MB. It can go to 1GB if the odd memory/PROM issue is ever worked out. But 512MB is it for now, so what are my odds of successfully using LTO on that? Especially if LTO helps to reduce the final binary size, that's less data being shuffled around main memory and the CPU caches, which, although means slower compile times, might hake such a machine a bit snippier. Though, I dread how long GCC will take to build itself w/ LTO. The O2 already needs ~18hrs for 4.8. I haven't tried 4.9 on it yet. -- Joshua Kinard Gentoo/MIPS kumba@gentoo.org 4096R/D25D95E3 2011-03-28 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-27 22:56 ` Joshua Kinard @ 2014-04-27 23:08 ` Rich Freeman 2014-04-27 23:14 ` Joshua Kinard 0 siblings, 1 reply; 32+ messages in thread From: Rich Freeman @ 2014-04-27 23:08 UTC (permalink / raw To: gentoo-dev On Sun, Apr 27, 2014 at 6:56 PM, Joshua Kinard <kumba@gentoo.org> wrote: > > My curiosity, as I have not attempted LTO yet on any machine, is what are > the RAM requirements? Is it a hard limit, wherein the compiler simply fails > if there isn't enough RAM, or does it just start hitting swap real hard? It just allocates RAM, and the OS does the rest. I've seen it invoke the OOM killer. That was back when I only had 8GB of RAM. Now I have 16GB and I only need to disable LTO on the really big packages. Of course, if you set an appropriate ulimit then the process will just terminate more gracefully. I'd highly recommend doing just that if you have a lot of swap available. > Those of us using older archs where the RAM is limited might have to be more > cautious w/ LTO. I.e., my SGI O2 maxes right now at 512MB. It can go to > 1GB if the odd memory/PROM issue is ever worked out. But 512MB is it for > now, so what are my odds of successfully using LTO on that? About zero. Well, I'm sure it will work fine for hello.c, especially if you eliminate any function calls inside of it. > > Especially if LTO helps to reduce the final binary size, that's less data > being shuffled around main memory and the CPU caches, which, although means > slower compile times, might hake such a machine a bit snippier. Though, I > dread how long GCC will take to build itself w/ LTO. The O2 already needs > ~18hrs for 4.8. I haven't tried 4.9 on it yet. Yeah, good luck with that... :) I'd be curious as to what you find. You can always try it out by picking a small package and doing a CFLAGS=foo emerge bar. Be sure to only use -j1 -flto=1 as well. Rich ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-27 23:08 ` Rich Freeman @ 2014-04-27 23:14 ` Joshua Kinard 2014-04-28 0:40 ` "C. Bergström" 0 siblings, 1 reply; 32+ messages in thread From: Joshua Kinard @ 2014-04-27 23:14 UTC (permalink / raw To: gentoo-dev On 04/27/2014 19:08, Rich Freeman wrote: > On Sun, Apr 27, 2014 at 6:56 PM, Joshua Kinard <kumba@gentoo.org> wrote: >> >> My curiosity, as I have not attempted LTO yet on any machine, is what are >> the RAM requirements? Is it a hard limit, wherein the compiler simply fails >> if there isn't enough RAM, or does it just start hitting swap real hard? > > It just allocates RAM, and the OS does the rest. I've seen it invoke > the OOM killer. That was back when I only had 8GB of RAM. Now I have > 16GB and I only need to disable LTO on the really big packages. > > Of course, if you set an appropriate ulimit then the process will just > terminate more gracefully. I'd highly recommend doing just that if > you have a lot of swap available. My favourite, starting long compiles on slow boxen, only to wake up to discover they failed in the final five minutes of the build over something as trite as low memory :) >> Those of us using older archs where the RAM is limited might have to be more >> cautious w/ LTO. I.e., my SGI O2 maxes right now at 512MB. It can go to >> 1GB if the odd memory/PROM issue is ever worked out. But 512MB is it for >> now, so what are my odds of successfully using LTO on that? > > About zero. Well, I'm sure it will work fine for hello.c, especially > if you eliminate any function calls inside of it. About zero? So, some floating point value infinitely between 0 and 1? Hmm, maybe I'll try it once I get my SGI Octane to boot Linux again. >> >> Especially if LTO helps to reduce the final binary size, that's less data >> being shuffled around main memory and the CPU caches, which, although means >> slower compile times, might hake such a machine a bit snippier. Though, I >> dread how long GCC will take to build itself w/ LTO. The O2 already needs >> ~18hrs for 4.8. I haven't tried 4.9 on it yet. > > Yeah, good luck with that... :) > > I'd be curious as to what you find. You can always try it out by > picking a small package and doing a CFLAGS=foo emerge bar. Be sure to > only use -j1 -flto=1 as well. O2 only has one CPU, so it's always -j1. SMP on my other MIPS machines doesn't work yet (either Linux isn't supported, or I haven't debugged SMP code yet). -- Joshua Kinard Gentoo/MIPS kumba@gentoo.org 4096R/D25D95E3 2011-03-28 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-27 23:14 ` Joshua Kinard @ 2014-04-28 0:40 ` "C. Bergström" 2014-04-28 3:54 ` Joshua Kinard 2014-04-28 4:49 ` Richard Yao 0 siblings, 2 replies; 32+ messages in thread From: "C. Bergström" @ 2014-04-28 0:40 UTC (permalink / raw To: gentoo-dev; +Cc: Joshua Kinard On 04/28/14 06:14 AM, Joshua Kinard wrote: > On 04/27/2014 19:08, Rich Freeman wrote: >> On Sun, Apr 27, 2014 at 6:56 PM, Joshua Kinard <kumba@gentoo.org> wrote: >>> My curiosity, as I have not attempted LTO yet on any machine, is what are >>> the RAM requirements? Is it a hard limit, wherein the compiler simply fails >>> if there isn't enough RAM, or does it just start hitting swap real hard? >> It just allocates RAM, and the OS does the rest. I've seen it invoke >> the OOM killer. That was back when I only had 8GB of RAM. Now I have >> 16GB and I only need to disable LTO on the really big packages. >> >> Of course, if you set an appropriate ulimit then the process will just >> terminate more gracefully. I'd highly recommend doing just that if >> you have a lot of swap available. > My favourite, starting long compiles on slow boxen, only to wake up to > discover they failed in the final five minutes of the build over something > as trite as low memory :) > > >>> Those of us using older archs where the RAM is limited might have to be more >>> cautious w/ LTO. I.e., my SGI O2 maxes right now at 512MB. It can go to >>> 1GB if the odd memory/PROM issue is ever worked out. But 512MB is it for >>> now, so what are my odds of successfully using LTO on that? >> About zero. Well, I'm sure it will work fine for hello.c, especially >> if you eliminate any function calls inside of it. > About zero? So, some floating point value infinitely between 0 and 1? Hmm, > maybe I'll try it once I get my SGI Octane to boot Linux again. > > >>> Especially if LTO helps to reduce the final binary size, that's less data >>> being shuffled around main memory and the CPU caches, which, although means >>> slower compile times, might hake such a machine a bit snippier. Though, I >>> dread how long GCC will take to build itself w/ LTO. The O2 already needs >>> ~18hrs for 4.8. I haven't tried 4.9 on it yet. >> Yeah, good luck with that... :) >> >> I'd be curious as to what you find. You can always try it out by >> picking a small package and doing a CFLAGS=foo emerge bar. Be sure to >> only use -j1 -flto=1 as well. > O2 only has one CPU, so it's always -j1. SMP on my other MIPS machines > doesn't work yet (either Linux isn't supported, or I haven't debugged SMP > code yet). On those old SGI MIPS machines use MIPSPro. It had better (LTO/whole program) optimizations than GCC more than 10 years ago (imho and gcc may have caught up now in 4.9). Just add the -ipa flag and test. In fairness there is primarily 3 limitations with MIPSPro IPA 1) It set a rather low (by modern standards) cutoff for when IPA wouldn't be really turned on (like 1MLOC or something) 2) GCC and others will do IPA on a single file (Module) - whereas MIPSPro required whole program to do some similar optimizations 3) MIPSPro is/was also a very very slow compiler (I do not recommend this for large c++ codes) Most "sane" compilers who are dealing with IPA will have some cutoff in order to avoids insane levels of memory usage. The compiler will bail internally on the problem, but the compilation won't fail. It's also in theory possible to segment the problem and work on chunks of it. This would also in theory move to longer compile times, but lower memory constraints. "We" aren't doing this yet and I don't know anyone who is, but I'm possibly just uninformed. In terms of general performance gains using LTO - The #1 candidate would be the linux kernel actually. See if anyone can get that to work ;) While this thread is fun - I should exit() here since it doesn't seem productive to discuss further.. Thanks ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-28 0:40 ` "C. Bergström" @ 2014-04-28 3:54 ` Joshua Kinard 2014-04-28 4:49 ` Richard Yao 1 sibling, 0 replies; 32+ messages in thread From: Joshua Kinard @ 2014-04-28 3:54 UTC (permalink / raw To: gentoo-dev On 04/27/2014 20:40, "C. Bergström" wrote: > On those old SGI MIPS machines use MIPSPro. It had better (LTO/whole > program) optimizations than GCC more than 10 years ago (imho and gcc may > have caught up now in 4.9). Just add the -ipa flag and test. In fairness > there is primarily 3 limitations with MIPSPro IPA [snip] That's if they ran IRIX. They run Linux :) -- Joshua Kinard Gentoo/MIPS kumba@gentoo.org 4096R/D25D95E3 2011-03-28 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-28 0:40 ` "C. Bergström" 2014-04-28 3:54 ` Joshua Kinard @ 2014-04-28 4:49 ` Richard Yao 1 sibling, 0 replies; 32+ messages in thread From: Richard Yao @ 2014-04-28 4:49 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 312 bytes --] On Sun 27 Apr 2014 08:40:08 PM EDT, "C. Bergström" wrote: > In terms of general performance gains using LTO - The #1 candidate > would be the linux kernel actually. See if anyone can get that to work ;) Intel's Andi Kleen is working on it: http://lkml.iu.edu/hypermail/linux/kernel/1404.0/03450.html [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-27 11:23 ` Rich Freeman 2014-04-27 11:41 ` "C. Bergström" 2014-04-27 22:56 ` Joshua Kinard @ 2014-04-28 21:46 ` Andrew Savchenko 2014-04-28 23:45 ` Rich Freeman 2 siblings, 1 reply; 32+ messages in thread From: Andrew Savchenko @ 2014-04-28 21:46 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 1400 bytes --] Hello, On Sun, 27 Apr 2014 07:23:11 -0400 Rich Freeman wrote: > And yet, in the same paragraph you mention -O3, which is > tantamount to just setting a flag and walking away. That turns > on 14 things you probably don't really need. Why 14 things? According to gcc-4.8.2 manual -O3 enables the following: -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-vectorize, -fvect-cost-model, -ftree-partial-pre, -fipa-cp-clone. Some of this options triggers another ones, but these 8 things are sufficient to mimic -O3 completely. From my experience only three of them are harmful: -finline-functions and -fipa-cp-clone bloat code size significantly hurting performance due to more CPU cache misses. -ftree-vectorize may be used on amd64 (performance boost is in the range -3.. +5%), but is a complete menace on x86: a lot of ICEs and a lot of segfaults due to stack misalignment and even some working but miscompiled code. While some (but not all) stack alignment issues may be fixed with -mstackrealign, this drops performance enhancement to negative values. All other -O3 option have either no effect or measurable performance enhancements in the range of several percent. Tests were made using multimedia packages (mplayer, ffmpeg, x264) and scientific ones (root, pythia, geant, blas libs). Best regards, Andrew Savchenko [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-28 21:46 ` Andrew Savchenko @ 2014-04-28 23:45 ` Rich Freeman 0 siblings, 0 replies; 32+ messages in thread From: Rich Freeman @ 2014-04-28 23:45 UTC (permalink / raw To: gentoo-dev On Mon, Apr 28, 2014 at 5:46 PM, Andrew Savchenko <bircoph@gmail.com> wrote: > Hello, > > On Sun, 27 Apr 2014 07:23:11 -0400 Rich Freeman wrote: >> And yet, in the same paragraph you mention -O3, which is >> tantamount to just setting a flag and walking away. That turns >> on 14 things you probably don't really need. > > Why 14 things? ... > > From my experience only three of them are harmful: ... > All other -O3 option have either no effect or measurable > performance enhancements in the range of several percent. You missed my point. I think running batch optimizations like -O2/3 only makes sense. The argument was that -flto doesn't always help, and thus shouldn't always be used. My point was that convenience options like -O2/3 were used because while the options don't always help, they usually do, and nobody wants to bother with micromanaging them. Personally I use -O2 or -Os with a few additional options that are less space-expensive than full -O3, on the premise that cache and memory conservation probably buys you more than avoiding some jumps. But, short of profiling every package any selection is going to be a suboptimal choice based on averages. I wasn't trying to say that there was something wrong with -O3/2/etc. Rich ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-27 0:34 ` "C. Bergström" 2014-04-27 2:14 ` Alex Xu @ 2014-04-27 22:57 ` Joshua Kinard 2014-04-28 21:08 ` Andrew Savchenko 2 siblings, 0 replies; 32+ messages in thread From: Joshua Kinard @ 2014-04-27 22:57 UTC (permalink / raw To: gentoo-dev On 04/26/2014 20:34, "C. Bergström" wrote: > On 04/27/14 02:58 AM, Martin Vaeth wrote: >> Rich Freeman <rich0@gentoo.org> wrote: >>> FWIW the list of packages I have issues with include: >> Not sure whether this is the right place to post it. > It's interesting to see that rather lengthy list. From a compiler engineer > perspective I'd like to toss in my opinion [snip] What compiler, out of curiosity? -- Joshua Kinard Gentoo/MIPS kumba@gentoo.org 4096R/D25D95E3 2011-03-28 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-27 0:34 ` "C. Bergström" 2014-04-27 2:14 ` Alex Xu 2014-04-27 22:57 ` Joshua Kinard @ 2014-04-28 21:08 ` Andrew Savchenko 2 siblings, 0 replies; 32+ messages in thread From: Andrew Savchenko @ 2014-04-28 21:08 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 1588 bytes --] Hello, On Sun, 27 Apr 2014 07:34:05 +0700 C. Bergström wrote: [...] > Not to be a smart-ass, but will someone start a thread on global > PGO (profile guided optimizations) next? imho it would be > interesting and great to have some general training data already > contributed next to the ebuilds. For the science stuff I wouldn't > recommend it, but who knows.. Global PGO is meaningless, because PGO requires not just compiler flags, but package-specific tests covering all widely used profiles for package in question. So this requires intensive upstream work and in no way can be done in Gentoo for any significant number of packages. At this moment only two packages in tree support PGO: dev-libs/gmp and www-client/firefox. For gmp it works great. For firefox it is a menace: 1) with current in-tree firefox versions PGO can't be used on x86 at all, since linker doesn't fit in 3GB memory limit, even with memory-constraint options, both GNU ld and gold. 2) on amd64 4GB is surely not enough for linking of profile-enabled version, so I can't use it here too. 3) Old firefox versions (somewhere around 18) were successfully compiled on the same ~x86 and ~amd64 boxes. So something in firefox tree changed that much. There is also sci-libs/atlas in the science overlay which uses similar technique during build. But strictly speaking this is not PGO, as changes are made on algorithm level rather than on compiler's one: it test each block with different parameters and choses the fastest ones for current box. Best regards, Andrew Savchenko [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-26 19:58 ` Martin Vaeth 2014-04-27 0:34 ` "C. Bergström" @ 2014-04-27 0:49 ` Rich Freeman 1 sibling, 0 replies; 32+ messages in thread From: Rich Freeman @ 2014-04-27 0:49 UTC (permalink / raw To: gentoo-dev On Sat, Apr 26, 2014 at 3:58 PM, Martin Vaeth <martin@mvath.de> wrote: > I have not always tested whether filtering -fwhole-program > alone would be sufficient, but in many cases I did, and > usually it was not sufficient. Well, there is certainly something going on here, because... > app-arch/bzip2 +flto* This at least builds just fine for me with: CFLAGS="-march=amdfam10 -Os -pipe -frename-registers -fweb -freorder-blocks -freorder-blocks-and-partition -flto=5 -funit-at-a-time -ftree-pre -fgcse-sm -fgcse-las -fgcse-after-reload -fmerge-all-constants -ftree-vectorize -ftree-parallelize-loops=4 -mabm -msse4a -fstack-protector" It is possible that you might get different behavior on a different version of gcc, or with some other combination of CFLAGS including the ones you're using. Rich ^ permalink raw reply [flat|nested] 32+ messages in thread
* [gentoo-dev] Re: LTO use in the tree 2014-04-26 10:23 ` Michał Górny 2014-04-26 11:15 ` Rich Freeman @ 2014-04-26 14:35 ` Martin Vaeth 1 sibling, 0 replies; 32+ messages in thread From: Martin Vaeth @ 2014-04-26 14:35 UTC (permalink / raw To: gentoo-dev Michał Górny <mgorny@gentoo.org> wrote: > > Dnia 2014-04-22, o godz. 08:45:31 > Martin Vaeth <martin@mvath.de> napisa=B3(a): > >> On the other hand, if upstream tests and supports LTO, it should >> be communicated to the user somehow that this is the case. >> The same dilemma applies to some other CFLAGS which should not be >> used in general but only if the code is written for them. > > Why do you believe that LTO 'should not be used in general'? In the last sentence, I did not have LTO in mind but things like -fmerge-all-constants -fnothrow-opt -fno-enforce-eh-specs or -fno-common. The latter is perhaps again a bit similar to LTO: > As far as I understand, the LTO concept is suited well for most > programs For programs yes, but if libraries are involved often not: Experience shows that most packages which provide or use (internally) libraries break with LTO if upstream does not care about. In particular, unless gold linker is expected (which unfortunately cannot be so easily changed locally for a package and which causes other problems if used system wide), it can make sense to compile only certain parts of the package with LTO (e.g. the main binary). That's why it can make sense to let the package care about the -flto CFLAG. > (+ the usual limitations like memory). not to forget compilation time which increases by the remarkable factor 2, at least (unless optimized for lto with e.g. -fno-fat-lto-objects). ^ permalink raw reply [flat|nested] 32+ messages in thread
* [gentoo-dev] Re: LTO use in the tree 2014-04-22 8:45 ` Martin Vaeth 2014-04-26 10:23 ` Michał Górny @ 2014-05-03 0:24 ` Ryan Hill 1 sibling, 0 replies; 32+ messages in thread From: Ryan Hill @ 2014-05-03 0:24 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 1105 bytes --] On Tue, 22 Apr 2014 08:45:31 +0000 (UTC) Martin Vaeth <martin@mvath.de> wrote: > Ryan Hill <rhill@gentoo.org> wrote: > > > > One thing I forgot to mention - LTO can also have detrimental effect on > > certain architectures. On some (eg. ppc), performance can actually > > be degraded due to increased register pressure. > > If this really is the case it is not the problem of LTO but > of the optimizer: If the optimizer really produces *worse* > code when he *can* see the full program instead of only parts of it, > something is severely broken in the optimizer. Only decreasing the > possibilities of the optimizer by removing LTO would be the wrong way > to "solve" this problem. Yes, this is a problem caused by aggressive inlining, and is being worked on upstream[1]. I meant that currently released versions exhibit this behaviour. [1] see for example http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01098.html -- Ryan Hill psn: dirtyepic_sk gcc-porting/toolchain/wxwidgets @ gentoo.org 47C3 6D62 4864 0E49 8E9E 7F92 ED38 BD49 957A 8463 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] Re: LTO use in the tree 2014-04-21 4:02 ` [gentoo-dev] " Ryan Hill 2014-04-22 8:45 ` Martin Vaeth @ 2014-04-22 18:10 ` Matt Turner 2014-05-02 23:55 ` Ryan Hill 1 sibling, 1 reply; 32+ messages in thread From: Matt Turner @ 2014-04-22 18:10 UTC (permalink / raw To: gentoo-dev > One thing I forgot to mention - LTO can also have detrimental effect on certain > architectures. On some (eg. ppc), performance can actually be degraded due to > increased register pressure. On others like alpha it's questionable if it'll > even work at all... Worked for me on alpha, at least for what I tried. It cut eix's binary from 2 to 1.3 MB as well. ^ permalink raw reply [flat|nested] 32+ messages in thread
* [gentoo-dev] Re: LTO use in the tree 2014-04-22 18:10 ` Matt Turner @ 2014-05-02 23:55 ` Ryan Hill 0 siblings, 0 replies; 32+ messages in thread From: Ryan Hill @ 2014-05-02 23:55 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 801 bytes --] On Tue, 22 Apr 2014 11:10:19 -0700 Matt Turner <mattst88@gentoo.org> wrote: > > One thing I forgot to mention - LTO can also have detrimental effect on > > certain architectures. On some (eg. ppc), performance can actually be > > degraded due to increased register pressure. On others like alpha it's > > questionable if it'll even work at all... > > Worked for me on alpha, at least for what I tried. It cut eix's binary > from 2 to 1.3 MB as well. Cool, thanks for the info. I was going by the request we had back in 4.6 to turn off LTO for alpha because an upstream developer mentioned it wasn't expected to work. -- Ryan Hill psn: dirtyepic_sk gcc-porting/toolchain/wxwidgets @ gentoo.org 47C3 6D62 4864 0E49 8E9E 7F92 ED38 BD49 957A 8463 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [gentoo-dev] LTO use in the tree 2014-04-21 3:14 [gentoo-dev] LTO use in the tree Ryan Hill 2014-04-21 4:02 ` [gentoo-dev] " Ryan Hill @ 2014-04-21 6:53 ` Michał Górny 1 sibling, 0 replies; 32+ messages in thread From: Michał Górny @ 2014-04-21 6:53 UTC (permalink / raw To: gentoo-dev; +Cc: rhill [-- Attachment #1: Type: text/plain, Size: 973 bytes --] Dnia 2014-04-20, o godz. 21:14:51 Ryan Hill <rhill@gentoo.org> napisał(a): > As more and more packages are starting to add LTO flags automatically through > their build systems, I thought I'd point out a couple things: > > - LTO utterly destroys debug info. Flags like -g are incompatible with LTO. > > - LTO causes .GCC.command.line sections to be discarded, which means your > package will always be QA flagged as ignoring CFLAGS. > > - LTO takes a _lot_ of memory. That memory is required on the host arch. > Distcc doesn't help things here, because linking happens locally. Consider > all the archs your package is built on, and if they all routinely have > multiple GBs of memory installed. Those are the reasons why I have disabled the default -flto in systemd. For other packages based on systemd configure macros, it can be disabled via the following configure hack: cc_cv_CFLAGS__flto=no -- Best regards, Michał Górny [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 966 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2014-05-03 0:25 UTC | newest] Thread overview: 32+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-21 3:14 [gentoo-dev] LTO use in the tree Ryan Hill 2014-04-21 4:02 ` [gentoo-dev] " Ryan Hill 2014-04-22 8:45 ` Martin Vaeth 2014-04-26 10:23 ` Michał Górny 2014-04-26 11:15 ` Rich Freeman 2014-04-26 15:00 ` Martin Vaeth 2014-04-26 16:34 ` Rich Freeman 2014-04-26 19:58 ` Martin Vaeth 2014-04-27 0:34 ` "C. Bergström" 2014-04-27 2:14 ` Alex Xu 2014-04-27 2:37 ` "C. Bergström" 2014-04-27 11:23 ` Rich Freeman 2014-04-27 11:41 ` "C. Bergström" 2014-04-27 14:52 ` Rich Freeman 2014-04-28 8:25 ` Martin Vaeth 2014-04-28 8:53 ` Tomáš Pružina 2014-04-27 22:56 ` Joshua Kinard 2014-04-27 23:08 ` Rich Freeman 2014-04-27 23:14 ` Joshua Kinard 2014-04-28 0:40 ` "C. Bergström" 2014-04-28 3:54 ` Joshua Kinard 2014-04-28 4:49 ` Richard Yao 2014-04-28 21:46 ` Andrew Savchenko 2014-04-28 23:45 ` Rich Freeman 2014-04-27 22:57 ` Joshua Kinard 2014-04-28 21:08 ` Andrew Savchenko 2014-04-27 0:49 ` Rich Freeman 2014-04-26 14:35 ` Martin Vaeth 2014-05-03 0:24 ` Ryan Hill 2014-04-22 18:10 ` Matt Turner 2014-05-02 23:55 ` Ryan Hill 2014-04-21 6:53 ` [gentoo-dev] " Michał Górny
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox