* [gentoo-doc-cvs] cvs commit: co-guide.xml metadoc.xml
@ 2007-06-27 6:04 Josh Saddler
0 siblings, 0 replies; only message in thread
From: Josh Saddler @ 2007-06-27 6:04 UTC (permalink / raw
To: gentoo-doc-cvs
nightmorph 07/06/27 06:04:17
Modified: metadoc.xml
Added: co-guide.xml
Log:
added compilation optimization guide, bug 68282 (at last).
Revision Changes Path
1.184 xml/htdocs/doc/en/metadoc.xml
file : http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/metadoc.xml?rev=1.184&view=markup
plain: http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/metadoc.xml?rev=1.184&content-type=text/plain
diff : http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/metadoc.xml?r1=1.183&r2=1.184
Index: metadoc.xml
===================================================================
RCS file: /var/cvsroot/gentoo/xml/htdocs/doc/en/metadoc.xml,v
retrieving revision 1.183
retrieving revision 1.184
diff -u -r1.183 -r1.184
--- metadoc.xml 3 Jun 2007 16:35:40 -0000 1.183
+++ metadoc.xml 27 Jun 2007 06:04:17 -0000 1.184
@@ -1,9 +1,9 @@
<?xml version="1.0" encoding="UTF-8"?>
-<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/metadoc.xml,v 1.183 2007/06/03 16:35:40 neysx Exp $ -->
+<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/metadoc.xml,v 1.184 2007/06/27 06:04:17 nightmorph Exp $ -->
<!DOCTYPE metadoc SYSTEM "/dtd/metadoc.dtd">
<metadoc lang="en">
-<version>1.107</version>
+<version>1.108</version>
<members>
<lead>neysx</lead>
<member>cam</member>
@@ -397,6 +397,7 @@
<file id="zsh">/doc/en/zsh.xml</file>
<file id="change-chost">/doc/en/change-chost.xml</file>
<file id="xfce-config">/doc/en/xfce-config.xml</file>
+ <file id="co-guide">/doc/en/co-guide.xml</file>
<file id="qa-autofailure">/proj/en/qa/autofailure.xml</file>
<file id="qa-automagic">/proj/en/qa/automagic.xml</file>
<file id="qa-backtraces">/proj/en/qa/backtraces.xml</file>
@@ -802,6 +803,10 @@
<memberof>sysadmin_specific</memberof>
<fileid>home-router-howto</fileid>
</doc>
+ <doc id="co-guide">
+ <memberof>sysadmin_specific</memberof>
+ <fileid>co-guide</fileid>
+ </doc>
<doc id="gentoo-dev-handbook">
<memberof>gentoodev</memberof>
<memberof>project_devrel</memberof>
1.1 xml/htdocs/doc/en/co-guide.xml
file : http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/co-guide.xml?rev=1.1&view=markup
plain: http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/co-guide.xml?rev=1.1&content-type=text/plain
Index: co-guide.xml
===================================================================
<?xml version='1.0' encoding='UTF-8'?>
<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/co-guide.xml,v 1.1 2007/06/27 06:04:17 nightmorph Exp $ -->
<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
<guide link="/doc/en/co-guide.xml">
<title>Compilation Optimization Guide</title>
<author title="Author">
<mail link="nightmorph@gentoo.org">Joshua Saddler</mail>
</author>
<abstract>
This guide provides an introduction to optimizing compiled code using safe, sane
CFLAGS and CXXFLAGS. It also as describes the theory behind optimizing in
general.
</abstract>
<!-- The content of this document is licensed under the CC-BY-SA license -->
<!-- See http://creativecommons.org/licenses/by-sa/2.5 -->
<license/>
<version>1.0</version>
<date>2007-06-26</date>
<chapter>
<title>Introduction</title>
<section>
<title>What are CFLAGS and CXXFLAGS?</title>
<body>
<p>
CFLAGS and CXXFLAGS are environment variables that are used to tell the GNU
Compiler Collection, <c>gcc</c>, what kinds of switches to use when compiling
source code. CFLAGS are for code written in C, while CXXFLAGS are for code
written in C++.
</p>
<p>
They can be used to decrease the amount of debug messages for a program,
increase error warning levels, and, of course, to optimize the code produced.
The <uri
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Invoking-GCC.html#Invoking-GCC">GNU
gcc handbook</uri> maintains a complete list of available options and their
purposes.
</p>
</body>
</section>
<section>
<title>How are they used?</title>
<body>
<p>
CFLAGS and CXXFLAGS can be used in two ways. First, they can be used
per-program, by directly invoking <c>gcc</c> and then some bit of code you wish
to compile.
</p>
<pre caption="Compiling a program directly">
$ <i>CFLAGS="-march=i686" gcc file.c</i>
</pre>
<p>
However, this should not be done when installing packages found in the Portage
tree. Instead, set your CFLAGS and CXXFLAGS in <path>/etc/make.conf</path>. This
way all packages will be compiled using the options you specify.
</p>
<pre caption="CFLAGS in /etc/make.conf">
CFLAGS="-march=athlon64 -O2 -pipe"
CXXFLAGS="${CFLAGS}"
</pre>
<p>
As you can see, CXXFLAGS is set to use all the options present in CFLAGS. This
is what you'll want almost without fail. You shouldn't ever need to specify
additional options in CXXFLAGS.
</p>
<impo>
Portage cannot use CFLAGS on a per-package basis, nor is there any supported
method of forcing it to do so. The flags you set in <path>/etc/make.conf</path>
will be used for <e>all</e> packages you install.
</impo>
</body>
</section>
<section>
<title>Misconceptions</title>
<body>
<p>
While CFLAGS and CXXFLAGS can be very effective means of getting source code to
produce smaller and/or faster binaries, they can also impair the function of
your code, bloat its size, slow down its execution time, or even cause
compilation failures!
</p>
<p>
CFLAGS are not a magic bullet; they will not automatically make your system run
any faster or your binaries to take up less space on disk. Adding more and more
flags in an attempt to optimize (or "rice") your system is a sure recipe for
failure. There is a point at which you will reach diminishing returns.
</p>
<p>
Despite the bragging you'll find on the internet, aggressive CFLAGS and CXXFLAGS
are far more likely to harm your programs than do them any good. Keep in mind
that the reason the flags exist in the first place is because they are designed
to be used at specific places for specific purposes. Just because one particular
CFLAG is good for one bit of code doesn't mean that it is suited to compiling
everything you will ever install on your machine!
</p>
</body>
</section>
<section>
<title>Ready?</title>
<body>
<p>
Now that you're aware of some of the risks involved, let's take a look at some
sane, safe optimizations for your computer. These will hold you in good stead
and will endear you to developers the next time you report a problem on <uri
link="http://bugs.gentoo.org">Bugzilla</uri>. (Developers will usually request
that you recompile a package with minimal CFLAGS to see if the problem persists.
Remember, aggressive flags can ruin code.)
</p>
</body>
</section>
</chapter>
<chapter>
<title>Optimizing</title>
<section>
<title>The basics</title>
<body>
<p>
The goal behind using CFLAGS and CXXFLAGS is to create code tailor-made to your
system; it should function perfectly while being lean and fast, if possible.
Sometimes these conditions are mutually exclusive, so we'll stick with
combinations known to work well. Ideally, they are the best available for any
CPU architecture. We'll mention the aggressive flags later so you know what to
look out for. We won't discuss every option listed on the <c>gcc</c> manual
(there are hundreds), but we'll cover the basic, most common flags.
</p>
<note>
Whenever you're not sure what a flag actually does, refer to the relevant
chapter of the <uri
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Optimize-Options.html#Optimize-Options">gcc
manual</uri>. If you're still stumped, try Google, or check out the <c>gcc</c>
<uri link="http://gcc.gnu.org/lists.html">mailing lists</uri>.
</note>
</body>
</section>
<section>
<title>-march</title>
<body>
<p>
The first and most important option is <c>-march</c>. This tells the compiler
what code it should produce for your processor <uri
link="http://en.wikipedia.org/wiki/Microarchitecture">architecture</uri> (or
<e>arch</e>); it says that it should produce code for a certain kind of CPU.
Different CPUs have different capabilities, support different instruction sets,
and have different ways of executing code. The <c>-march</c> flag will instruct
the compiler to produce code specifically for your CPU, with all its
capabilities, features, instruction sets, quirks, and so on.
</p>
<p>
Even though the CHOST variable in <path>/etc/make.conf</path> specifies the
general architecture used, <c>-march</c> should still be used so that programs
can be optimized for your specific processor.
</p>
<p>
What kind of CPU do you have? To find out, run the following command:
</p>
<pre caption="Examining CPU information">
$ <i>cat /proc/cpuinfo</i>
</pre>
<p>
Now let's see <c>-march</c> in action. This example is for an older Pentium III
chip:
</p>
<pre caption="/etc/make.conf: Pentium III">
CFLAGS="-march=pentium3"
CXXFLAGS="${CFLAGS}"
</pre>
<p>
Here's another one for a 64-bit Sparc CPU:
</p>
<pre caption="/etc/make.conf: Sparc">
CFLAGS="-march=ultrasparc"
CXXFLAGS="${CFLAGS}"
</pre>
<p>
Also available are the <c>-mcpu</c> and <c>-mtune</c> flags. Either of these
should <e>only</e> be used when there is no available <c>-march</c> option.
What's the difference between them? <c>-march</c> is much more specific about
which processor features will be used when compiling code; it is a better
choice. <c>-mcpu</c> will produce much more generic code less optimized for your
machine. <c>-mtune</c> is even more generic than <c>-mcpu</c>. Whenever
possible, use <c>-march</c>. For some less common architectures such as PowerPC
and Alpha, <c>-mcpu</c> must be used.
</p>
<note>
For more suggested <c>-march</c> settings, please read chapter 5 of the
appropriate <uri link="/doc/en/handbook/index.xml">Gentoo Installation
Handbook</uri> for your arch. Also, read the <c>gcc</c> manual's list of <uri
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Submodel-Options.html#Submodel-Options">architecture-specific
options</uri>, as well as more detailed explanations about the differences
between <c>-march</c>, <c>-mcpu</c>, and <c>-mtune</c>. This is quite helpful
for determining which <c>-march</c> setting you should use, especially since on
some architectures, such as x86, <c>-mcpu</c> is deprecated and <c>-mtune</c>
should be used instead.
</note>
</body>
</section>
<section>
<title>-O</title>
<body>
<p>
Next up is the <c>-O</c> variable. This controls the overall level of
optimization. This makes the code compilation take somewhat more time, and can
take up much more memory, especially as you increase the level of optimization.
</p>
<p>
There are five <c>-O</c> settings: <c>-O0</c>, <c>-O1</c>, <c>-O2</c>,
<c>-O3</c>, and <c>-Os</c>. You should use only one of them in
<path>/etc/make.conf</path>.
</p>
<p>
The with the exception of <c>-O0</c>, the <c>-O</c> settings each activate
several additional flags, so be sure to read the gcc manual's chapter on <uri
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Optimize-Options.html#Optimize-Options">optimization
options</uri> to learn which flags are activated at each <c>-O</c> level, as
well as some explanations as to what they do.
</p>
<p>
Let's examine each optimization level:
</p>
<ul>
<li>
<c>-O0</c>: This level (that's the letter "O" followed by a zero) turns off
optimization entirely and is the default if no <c>-O</c> level is specified
in CFLAGS or CXXFLAGS. Your code will not be optimized; it's not normally
desired.
</li>
<li>
<c>-O1</c>: This is the most basic optimization level. The compiler will try
to produce faster, smaller code without taking much compilation time.
It's pretty basic, but it should get the job done all the time.
</li>
<li>
<c>-O2</c>: A step up from <c>-O1</c>. This is the <e>recommended</e> level
of optimization unless you have special needs (such as <c>-Os</c>, as will
be explained shortly). <c>-O2</c> will activate a few more flags in addition
to the ones activated by <c>-O1</c>. With <c>-O2</c>, the compiler will
attempt to increase code performance without compromising on size, and
without taking too much compilation time.
</li>
<li>
<c>-O3</c>: This is the highest level of optimization possible, and also the
riskiest. It will take a longer time to compile your code with this option,
and in fact it <e>should not be used system-wide with <c>gcc</c> 4.x</e>.
The behavior of <c>gcc</c> has changed significantly since version 3.x. In
3.x, <c>-O3</c> has been shown to lead to marginally faster execution times
over <c>-O2</c>, but this is no longer the case with <c>gcc</c> 4.x.
Compiling all your packages with <c>-O3</c> <e>will</e> result in larger
binaries that require more memory, and will significantly increase the odds
of compilation failure or unexpected program behavior (including errors).
The downsides outweigh the benefits; remember the principle of diminishing
returns. <b>Using <c>-O3</c> is not recommended for <c>gcc</c> 4.x.</b>
</li>
<li>
<c>-Os</c>: This level will optimize your code for size. It activates all
<c>-O2</c> options that don't increase the size of the generated code. It's
useful for machines that have extremely limited disk storage space and/or
have CPUs with small cache sizes.
</li>
</ul>
<p>
As previously mentioned, <c>-O2</c> is the recommended optimization level. If
package compilations error out, check to make sure that you aren't using
<c>-O3</c>. As a fallback option, try setting your CFLAGS and CXXFLAGS to a
lower optimization level, such as <c>-O1</c> or <c>-Os</c> and recompile the
package.
</p>
</body>
</section>
<section>
<title>-pipe</title>
<body>
<p>
A fun, safe flag to use is <c>-pipe</c>. This flag actually has no effect on the
generated code, but it makes the compilation process faster. It tells the
compiler to use pipes instead of temporary files during the different stages of
compilation.
</p>
</body>
</section>
<section>
<title>-fomit-frame-pointer</title>
<body>
<p>
This is a very common flag designed to reduce generated code size. It is turned
on at all levels of <c>-O</c> (except <c>-O0</c>) on architectures where doing
so does not interfere with debugging (such as x86-64), but you may need to
activate it yourself by adding it to your flags. Though the GNU <c>gcc</c>
manual does not specify all architectures it is turned on by using <c>-O</c>,
you will need to explicity activate it on x86. However, using this flag will
make debugging hard to impossible.
</p>
<p>
In particular, it makes troubleshooting applications written in Java much
harder, though Java is not the only code affected by using this flag. So while
the flag can help, it can also make debugging harder. If you don't plan to do
much debugging and haven't added any other debugging-related CFLAGS such as
<c>-ggdb</c> (and you aren't installing packages with the <c>debug</c> USE
flag), then try using <c>-fomit-frame-pointer</c>.
</p>
<impo>
Do <e>not</e> combine <c>-fomit-frame-pointer</c> with the similar flag
<c>-momit-leaf-frame-pointer</c>. Using the latter flag is discouraged, as
<c>-fomit-frame-pointer</c> already does the job properly. Furthermore,
<c>-momit-leaf-frame-pointer</c> has been shown to negatively impact code
performance.
<!--
source for this info:
http://www.coyotegulch.com/products/acovea/aco5p4gcc40.html
-->
</impo>
</body>
</section>
<section>
<title>-msse, -msse2, -msse3, -mmmx, -m3dnow</title>
<body>
<p>
These flags enable the <uri
link="http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE</uri>, <uri
link="http://en.wikipedia.org/wiki/SSE2">SSE2</uri>, <uri
link="http://en.wikipedia.org/wiki/SSSE3">SSE3</uri>, <uri
link="http://en.wikipedia.org/wiki/MMX">MMX</uri>, and <uri
link="http://en.wikipedia.org/wiki/3dnow">3DNow!</uri> instruction sets for x86
and x86-64 architectures. These are useful primarily in multimedia, gaming, and
other floating point-intensive computing tasks, though they also contain several
other mathematical enhancements. These instruction sets are found in more modern
CPUs.
</p>
<impo>
Be sure to check if your CPU supports these by running <c>cat /proc/cpuinfo</c>.
The output will include any supported additional instruction sets. Note that
<b>pni</b> is just a different name for SSE3.
</impo>
<p>
You normally don't need to add any of these flags to <path>/etc/make.conf</path>
as long as you are using the correct <c>-march</c> (for example,
<c>-march=nocona</c> implies <c>-msse3</c>). Some notable exceptions are newer
VIA and AMD64 CPUs that support instructions not implied by <c>-march</c> (such
as SSE3). For CPUs like these you'll need to enable additional flags where
appropriate after checking the output of <c>cat /proc/cpuinfo</c>.
</p>
<note>
You should check the <uri
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/i386-and-x86_002d64-Options.html#i386-and-x86_002d64-Options">list</uri>
of x86 and x86-64-specific flags to see which of these instruction sets are
activated by the proper CPU type flag. If an instruction is listed, then you
don't need to specify it; it will be turned on by using the proper <c>-march</c>
setting.
</note>
</body>
</section>
</chapter>
<chapter>
<title>Optimization FAQs</title>
<section>
<title>But I get better performance with -funroll-loops -fomg-optimize!</title>
<body>
<p>
No, you only <e>think</e> you do because someone has convinced you that more
flags are better. Aggressive flags will only hurt your applications when used
system-wide. Even the <c>gcc</c> <uri
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Optimize-Options.html#Optimize-Options">manual</uri>
says that using <c>-funroll-loops</c> and <c>-funroll-all-loops</c> makes code
larger and run more slowly. Yet for some reason, these two flags, along with
<c>-ffast-math</c>, <c>-fforce-mem</c>, <c>-fforce-addr</c>, and similar flags,
continue to be very popular among ricers who want the biggest bragging rights.
</p>
<p>
The truth of the matter is that they are dangerously aggressive flags. Take a
good look around the <uri link="http://forums.gentoo.org">Gentoo Forums</uri>
and <uri link="http://bugs.gentoo.org">Bugzilla</uri> to see what those flags
do: nothing good!
</p>
<p>
You don't need to use those flags globally in CFLAGS or CXXFLAGS. They will only
hurt performance. They may make you sound like you have a high-performance
system running on the bleeding edge, but they don't do anything but bloat your
code and get your bugs marked INVALID or WONTFIX.
</p>
<p>
You don't need dangerous flags like these. <b>Don't use them</b>. Stick to the
basics: <c>-march</c>, <c>-O</c>, and <c>-pipe</c>.
</p>
</body>
</section>
<section>
<title>What about -O levels higher than 3?</title>
<body>
<p>
Some users boast about even better performance obtained by using <c>-O4</c>,
<c>-O9</c>, and so on, but the reality is that <c>-O</c> levels higher than 3
have no effect. The compiler may accept CFLAGS like <c>-O4</c>, but it actually
doesn't do anything with them. It only performs the optimizations for
<c>-O3</c>, nothing more.
</p>
<p>
Need more proof? Examine the <c>gcc</c> <uri
link="http://gcc.gnu.org/viewcvs/trunk/gcc/opts.c?revision=124622&view=markup">source
code</uri>:
</p>
<pre caption="-O source code">
if (optimize >= 3)
{
flag_inline_functions = 1;
flag_unswitch_loops = 1;
flag_gcse_after_reload = 1;
/* Allow even more virtual operators. */
set_param_value ("max-aliased-vops", 1000);
set_param_value ("avg-aliased-vops", 3);
}
</pre>
<p>
As you can see, any value higher than 3 is treated as just <c>-O3</c>.
</p>
</body>
</section>
<section>
<title>What about redundant flags?</title>
<body>
<p>
Oftentimes CFLAGS and CXXFLAGS that are turned on at various <c>-O</c> levels
are specified redundantly in <path>/etc/make.conf</path>. Sometimes this is done
out of ignorance, but it is also done to avoid flag filtering or flag replacing.
</p>
<p>
Flag filtering/replacing is done in many of the ebuilds in the Portage tree. It
is usually done because packages fail to compile at certain <c>-O</c> levels, or
when the source code is too sensitive for any additional flags to be used. The
ebuild will either filter out some or all CFLAGS and CXXFLAGS, or it may replace
<c>-O</c> with a different level.
</p>
<p>
The <uri
link="http://devmanual.gentoo.org/ebuild-writing/functions/src_compile/build-environment/index.html">Gentoo
Developer Manual</uri> outlines where and how flag filtering/replacing works.
</p>
<p>
It's possible to circumvent <c>-O</c> filtering by redundantly listing the flags
for a certain level, such as <c>-O3</c>, by doing things like:
</p>
<pre caption="Specifying redundant CFLAGS">
CFLAGS="-O3 -finline-functions -funswitch-loops"
</pre>
<p>
However, <brite>this is not a smart thing to do</brite>. CFLAGS are filtered for
a reason! When flags are filtered, it means that it is unsafe to build a package
with those flags. Clearly, it is <e>not</e> safe to compile your whole system
with <c>-O3</c> if some of the flags turned on by that level will cause problems
with certain packages. Therefore, you shouldn't try to "outsmart" the developers
who maintain those packages. <e>Trust the developers</e>. Flag filtering and
replacing is done for your benefit! If an ebuild specifies alternative flags,
then don't try to get around it.
</p>
<p>
You will most likely continue to run into problems when you build a package with
unacceptable flags. When you report your troubles on Bugzilla, the flags you use
in <path>/etc/make.conf</path> will be readily visible and you will be told to
recompile without those flags. Save yourself the trouble of recompiling by not
using redundant flags in the first place! Don't just automatically assume that
you know better than the developers.
</p>
</body>
</section>
<section>
<title>What about LDFLAGS?</title>
<body>
<p>
Don't use them. You may have heard that they can speed up application load times
or reduce binary size, but in reality, LDFLAGS are more likely to make your
applications stop working. They are not supported, and you can expect to have
your bugs closed and marked INVALID if you report errors with packages while
using LDFLAGS. At the very least you will have to recompile all affected
packages without setting LDFLAGS.
</p>
</body>
</section>
</chapter>
<chapter>
<title>Resources</title>
<section>
<body>
<p>
The following resources are of some help in further understanding optimization:
</p>
<ul>
<li>
The <uri link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/">GNU gcc
manual</uri>
</li>
<li>
Chapter 5 of the <uri link="/doc/en/handbook/">Gentoo Installation
Handbooks</uri>
</li>
<li><c>man make.conf</c></li>
<li><uri link="http://en.wikipedia.org">Wikipedia</uri></li>
<li>
<uri link="http://www.coyotegulch.com/products/acovea/">Acovea</uri>, a
benchmarking and test suite that can be useful for determining how different
compiler flags interact and affect generated code, though its code
suggestions are not appropriate for system-wide use. It is available in
Portage: <c>emerge acovea</c>.
</li>
<li>The <uri link="http://forums.gentoo.org">Gentoo Forums</uri></li>
</ul>
</body>
</section>
</chapter>
</guide>
--
gentoo-doc-cvs@gentoo.org mailing list
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2007-06-27 6:04 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-27 6:04 [gentoo-doc-cvs] cvs commit: co-guide.xml metadoc.xml Josh Saddler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox