public inbox for gentoo-amd64@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-amd64] CFLAGS question from a AMD64 newbie
@ 2008-12-09 12:23 Sami Näätänen
  2008-12-09 13:13 ` Martin Herrman
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Sami Näätänen @ 2008-12-09 12:23 UTC (permalink / raw
  To: gentoo-amd64

So hi from a amd64 newbie. Not so newbie with Gentoo though. :)

My system is an Intel quad core core2 with a 2.4 GHz clock speed coupled with 
a 4GB of memory. No overclocking etc. Want this to be stable. :)

I'm just curious what people use as their stable CFLAGS in amd64 Gentoo?
(Sorry if this has been up lately, but I just switched to 64bit env so...)


Here is mine and some explanation of why (And I use ~arch system with gcc 4.3)

The flags are in order they are used in my CFLAGS and CXXFLAGS.

Gives stable base
-O2

Want to optimize for my system, but don't want "native"
-march=core2

If some ebuilds filter march this will still cache optimize etc for my system
-mtune=core2

Faster floating point math and better chance of vectorization
-mfpmath=sse

These because of the march might get filtered
-mmmx -msse -msse2 -msse3 -mssse3

For loop vectorization
-ftree-vectorize

Just to get some Idea how much vectorized loops there will be.
By the way I surprised the amount of "LOOP VECTORIZED" notes in the compile 
output. And only have seen couple of two versions 
-ftree-vectorizer-verbose=1

Of course I don't want temp files :)
-pipe


I don't use any loop unrolling etc, because it would only add the code size.
I'm not so brave that I would dare to use -Os.

So what's your experiences and reasoning behind what you do?
Any benchmarks or so?


PS. If you see same post without this added postscript. Just ignore it, it's 
the same post, but I forgot to change my default identity for this ML.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64] CFLAGS question from a AMD64 newbie
  2008-12-09 12:23 [gentoo-amd64] CFLAGS question from a AMD64 newbie Sami Näätänen
@ 2008-12-09 13:13 ` Martin Herrman
  2008-12-09 14:15   ` Branko Badrljica
  2008-12-09 13:28 ` [gentoo-amd64] " Volker Armin Hemmann
  2008-12-09 16:07 ` [gentoo-amd64] " Duncan
  2 siblings, 1 reply; 16+ messages in thread
From: Martin Herrman @ 2008-12-09 13:13 UTC (permalink / raw
  To: gentoo-amd64

On Tue, Dec 9, 2008 at 1:23 PM, Sami Näätänen <sn.ml@keijukammari.fi> wrote:
> So hi from a amd64 newbie. Not so newbie with Gentoo though. :)
>
> My system is an Intel quad core core2 with a 2.4 GHz clock speed coupled with
> a 4GB of memory. No overclocking etc. Want this to be stable. :)
>
> I'm just curious what people use as their stable CFLAGS in amd64 Gentoo?
> (Sorry if this has been up lately, but I just switched to 64bit env so...)
>
>
> Here is mine and some explanation of why (And I use ~arch system with gcc 4.3)
>
> The flags are in order they are used in my CFLAGS and CXXFLAGS.
>
> Gives stable base
> -O2
>
> Want to optimize for my system, but don't want "native"
> -march=core2
>
> If some ebuilds filter march this will still cache optimize etc for my system
> -mtune=core2
>
> Faster floating point math and better chance of vectorization
> -mfpmath=sse
>
> These because of the march might get filtered
> -mmmx -msse -msse2 -msse3 -mssse3
>
> For loop vectorization
> -ftree-vectorize
>
> Just to get some Idea how much vectorized loops there will be.
> By the way I surprised the amount of "LOOP VECTORIZED" notes in the compile
> output. And only have seen couple of two versions
> -ftree-vectorizer-verbose=1
>
> Of course I don't want temp files :)
> -pipe
>
>
> I don't use any loop unrolling etc, because it would only add the code size.
> I'm not so brave that I would dare to use -Os.
>
> So what's your experiences and reasoning behind what you do?
> Any benchmarks or so?
>
>
> PS. If you see same post without this added postscript. Just ignore it, it's
> the same post, but I forgot to change my default identity for this ML.
>

Dear Sami,

I have a Q9300 and used this:

CFLAGS="-march=nocona -O2 -pipe"
CXXFLAGS="${CFLAGS}"
USE="mmx sse sse2 <snip>"

Stable gcc version is 4.2.x. I switched to 4.3.2 by adding

>=sys-devel/gcc-4.3.2
>=sys-libs/glibc-2.7-r2

to /etc/portage/package.keywords. With 4.3.2 I use:

CFLAGS="-march=native -O2 -pipe"

With only a small effort, you get most of the benefits. So fine-tuning
to the edge will give you issues to solve with only a very small
percentage of performance increase in return.

My 2 cents..

Martin



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64] CFLAGS question from a AMD64 newbie
  2008-12-09 12:23 [gentoo-amd64] CFLAGS question from a AMD64 newbie Sami Näätänen
  2008-12-09 13:13 ` Martin Herrman
@ 2008-12-09 13:28 ` Volker Armin Hemmann
  2008-12-09 19:59   ` Sami Näätänen
  2008-12-09 16:07 ` [gentoo-amd64] " Duncan
  2 siblings, 1 reply; 16+ messages in thread
From: Volker Armin Hemmann @ 2008-12-09 13:28 UTC (permalink / raw
  To: gentoo-amd64

On Dienstag 09 Dezember 2008, Sami Näätänen wrote:
> So hi from a amd64 newbie. Not so newbie with Gentoo though. :)
>
> My system is an Intel quad core core2 with a 2.4 GHz clock speed coupled
> with a 4GB of memory. No overclocking etc. Want this to be stable. :)
>
> I'm just curious what people use as their stable CFLAGS in amd64 Gentoo?
> (Sorry if this has been up lately, but I just switched to 64bit env so...)
>
>
> Here is mine and some explanation of why (And I use ~arch system with gcc
> 4.3)
>
> The flags are in order they are used in my CFLAGS and CXXFLAGS.
>
> Gives stable base
> -O2

yes

>
> Want to optimize for my system, but don't want "native"
> -march=core2

ok

>
> If some ebuilds filter march this will still cache optimize etc for my
> system -mtune=core2

I would scrap that.

>
> Faster floating point math and better chance of vectorization
> -mfpmath=sse

superfluos. March with amd64 sse is used by default.

>
> These because of the march might get filtered
> -mmmx -msse -msse2 -msse3 -mssse3

if march get filtered, these might one of the reasons, I would remove them.
>
> For loop vectorization
> -ftree-vectorize

scrap that.


> -pipe
>
once upon a time I used this flags:

#CFLAGS="-march=k8 -O2 -pipe -fweb -ftracer -fpeel-loops -msse3"
and even
#CFLAGS="-march=k8 -O2 -fweb -ftracer -fpeel-loops -ftree-vectorize -frename-
registers -floop-optimize2 -msse3 -pipe"

to hunt down a java bug, I recompiled the whole system with:

CFLAGS="-march=k8 -O2 -msse3 -pipe"

and surprise - it was as fast as before - and compiling was faster too!




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64] CFLAGS question from a AMD64 newbie
  2008-12-09 13:13 ` Martin Herrman
@ 2008-12-09 14:15   ` Branko Badrljica
  2008-12-09 15:33     ` [gentoo-amd64] " Duncan
  0 siblings, 1 reply; 16+ messages in thread
From: Branko Badrljica @ 2008-12-09 14:15 UTC (permalink / raw
  To: gentoo-amd64

Martin Herrman wrote:
> On Tue, Dec 9, 2008 at 1:23 PM, Sami Näätänen <sn.ml@keijukammari.fi> wrote:
>   
> to /etc/portage/package.keywords. With 4.3.2 I use:
> CFLAGS="-march=native -O2 -pipe"
>
> With only a small effort, you get most of the benefits. So fine-tuning
> to the edge will give you issues to solve with only a very small
> percentage of performance increase in return.
>
> My 2 cents..
>
> Martin

My vote goes for this one, too. In general, anything higher than -O2 
will bring you next to nothing in best case, on averagy you'll end with 
longer code that executes about the same and in some cases, result will 
be slower. BY far most important thing, at least in general seems to be 
to nail your march right, in oreder for compiler to be able to utilize 
all that cpu has to offer.
For my Phenom "-march=barcelona" is optimal, but then I have problems 
with "<=gcc-4.2*", which has no separate K-10 backend, so 
"-march=native" is optimal - which means that gcc checks the CPU it is 
running on and makes the best available choice.

If all your gcc compilers are of version 4* upwards, then also 
"-combine" might be useful.

In short, CFLAGS="-march=native -O2 -pipe -combine" is optimal for me...


Regards,

Branko




^ permalink raw reply	[flat|nested] 16+ messages in thread

* [gentoo-amd64]  Re: CFLAGS question from a AMD64 newbie
  2008-12-09 14:15   ` Branko Badrljica
@ 2008-12-09 15:33     ` Duncan
  0 siblings, 0 replies; 16+ messages in thread
From: Duncan @ 2008-12-09 15:33 UTC (permalink / raw
  To: gentoo-amd64

Branko Badrljica <brankob@avtomatika.com> posted
493E7D73.8050800@avtomatika.com, excerpted below, on  Tue, 09 Dec 2008
15:15:15 +0100:

> If all your gcc compilers are of version 4* upwards, then also
> "-combine" might be useful.

I use -combine here, too, but be aware that there are a number of 
packages that still break with it.  I use 
/etc/portage/env/cat-egory/pkg files to remove it for such packages, 
using this line:

CFLAGS="${CFLAGS/ -combine/}"

I'll post my CFLAGS again (last time was probably over a year ago now) in 
another post.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




^ permalink raw reply	[flat|nested] 16+ messages in thread

* [gentoo-amd64]  Re: CFLAGS question from a AMD64 newbie
  2008-12-09 12:23 [gentoo-amd64] CFLAGS question from a AMD64 newbie Sami Näätänen
  2008-12-09 13:13 ` Martin Herrman
  2008-12-09 13:28 ` [gentoo-amd64] " Volker Armin Hemmann
@ 2008-12-09 16:07 ` Duncan
  2008-12-09 17:40   ` Branko Badrljica
                     ` (2 more replies)
  2 siblings, 3 replies; 16+ messages in thread
From: Duncan @ 2008-12-09 16:07 UTC (permalink / raw
  To: gentoo-amd64

Sami Näätänen <sn.ml@keijukammari.fi> posted
200812091423.30562.sn.ml@keijukammari.fi, excerpted below, on  Tue, 09 Dec
2008 14:23:30 +0200:

> My system is an Intel quad core core2 with a 2.4 GHz clock speed coupled
> with a 4GB of memory. No overclocking etc. Want this to be stable. :)
> 
> I'm just curious what people use as their stable CFLAGS in amd64 Gentoo?
> (Sorry if this has been up lately, but I just switched to 64bit env
> so...)
> 
> 
> Here is mine and some explanation of why (And I use ~arch system with
> gcc 4.3)

Well, you say you want stable, but then say you use ~arch, so I see 
you're not too stick in the mud. =:^)

Here's mine, for a dual Opteron 290:

CFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge-
all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize -
fdirectives-only -freorder-blocks-and-partition -combine"

CXXFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge-
all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize -
fdirectives-only"

You can look them up in the gcc manpage, or look back a year or so when I 
explained most of them, altho that was a couple gcc versions ago and they 
weren't quite the same.

But my basic strategy is this:  Because memory is so much slower than 
cache on a modern processor, in general it should pay to optimize for 
size even if it costs a few CPU cycles once in awhile.  Thus, until 
fairly recently I used -Os, but with gcc-4.3, decided to switch to -O2 
since gcc is getting smarter about such optimizations with -O2 now, and 
the few additional size optimizations with -Os now tend to be at the 
expense of cache (think -freorder-blocks-and-partition).  In any case, I 
certainly don't want -O3 or too much loop unrolling and inlining, at the 
expense of cache.

-frename-registers and -fweb are useful for taking advantage of the 
additional registers x86_64 has.  -fdirectives-only is there because it 
works better with ccache, which I use.  You know about -ftree-vectorize 
and -combine is discussed elsewhere on-thread.  -fmerge-all-constants 
isn't strictly C standard, but I've had absolutely zero issues with it, 
and it's going to help with cache.  -freorder-blocks-and-partition won't 
work on most C++ code, thus (along with -combine) the reason I split 
CFLAGS and CXXFLAGS, but it tells gcc to keep hot code together so it 
stays in cache better.  The various -fgcse-* options make gcc stricter 
about global common subexpression elimination (gcse) under various 
conditions.  This shouldn't add to size and may in fact reduce size by 
reducing instruction count (or moving it out of loops, size neutral), but 
it can increase compile time, the reason a few of them are enabled at -O3 
only, by default.

-combine is the one that causes the most problems, handled per trouble-
package as mentioned in the other thread using /etc/portage/env/* files.  
The -fredorder-blocks-and-partition can in some cases as well, but if you 
don't have either of those in CXXFLAGS, you'll avoid a lot of the problem 
right there.  Those are the only C(XX)FLAGS I have had issues with 
lately.  The others have worked just fine.

With quad-core you will likely be interested in upping your MAKEOPTS job 
count as well.  Just be aware that it too can cause issues at times.  
Again, however, it's easily worked around per-package as you come across 
them using the env/* files to set MAKEOPTS=-j1 or whatever.

Since you mentioned running ~arch, and assuming your PM is still portage, 
you may also want to take a look at the emerge's --jobs and --load-
average options, for parallel emerges, if you haven't already.  If you 
use them you'll probably find --keep-going useful as well, so it doesn't 
stop just because one of the parallel merges failed.

Finally, if you haven't already, consider pointing PORTAGE_TMPDIR at a 
tmpfs.  With 4 gig memory it should speed things up dramatically, and the 
worst-case is that it uses swap, sending to disk what would be 100% 
guaranteed to go to disk if you had PORTAGE_TMPDIR on disk.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64]  Re: CFLAGS question from a AMD64 newbie
  2008-12-09 16:07 ` [gentoo-amd64] " Duncan
@ 2008-12-09 17:40   ` Branko Badrljica
  2008-12-09 20:34   ` Sami Näätänen
  2008-12-16 23:00   ` Branko Badrljica
  2 siblings, 0 replies; 16+ messages in thread
From: Branko Badrljica @ 2008-12-09 17:40 UTC (permalink / raw
  To: gentoo-amd64

Duncan wrote:
>
>
> Well, you say you want stable, but then say you use ~arch, so I see 
> you're not too stick in the mud. =:^)
>
> Here's mine, for a dual Opteron 290:
>
> CFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge-
> all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize -
> fdirectives-only -freorder-blocks-and-partition -combine"
>
> CXXFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge-
> all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize -
> fdirectives-only"
>
> You can look them up in the gcc manpage, or look back a year or so when I 
> explained most of them, altho that was a couple gcc versions ago and they 
> weren't quite the same.
>
>   
<SNIP>


Been there, done praactically that, but it didn't make one quark of 
difference overall, except throwing gcc in a coma now and then, 
lenghtening compile problems and causing odd ( but rare ) bugs.

I tried to time several C programs of mine and found that plain -O1 
worked substantially better than plain -O2.

After that, I said sod all and used plain vanilla CLFAGS on new gcc and 
with right march. Works fine, with same speed, faster compiles and much 
less headaches on average.

In my experience, exotic CFLAGS can make a difference, but this varies 
wildldy from program part to program part, so unless one knows exactly 
what he is doing, he might be better of trusting compiler to use sane 
path with -O2. Besides that, portage doesn't have an option to compile 
just some part of the code with another, non_default CFLAGS...



> But my basic strategy is this:  Because memory is so much slower than 
> cache on a modern processor, in general it should pay to optimize for 
> size even if it costs a few CPU cycles once in awhile.
True, but he is asking for P4, which was notorious for having long 
pipelina and a neadache after cache miss, so for him -O2 or even -03 
might be better in _some_ cases.
But even so, IMVHO it is simply not worth the time and effort to fiddle 
with this, I'd use  golden default with right march here also and be 
done with it.






^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64] CFLAGS question from a AMD64 newbie
  2008-12-09 13:28 ` [gentoo-amd64] " Volker Armin Hemmann
@ 2008-12-09 19:59   ` Sami Näätänen
  2008-12-10  7:00     ` Branko Badrljica
  2008-12-10  7:27     ` Volker Armin Hemmann
  0 siblings, 2 replies; 16+ messages in thread
From: Sami Näätänen @ 2008-12-09 19:59 UTC (permalink / raw
  To: gentoo-amd64

On Tuesday 09 December 2008 15:28:21 Volker Armin Hemmann wrote:
> On Dienstag 09 Dezember 2008, Sami Näätänen wrote:
> > So hi from a amd64 newbie. Not so newbie with Gentoo though. :)

Well sorry to not give a more details. I'm not a newbie in Gentoo just in the 
amd64 side of things. Ie no experience of the bugs how things break in the 
tree if using one or the other etc.

So I take this now with a litle bit of more detail.

I have been hanging with Gentoo before the 1.4 days ie long before the yearly 
tagged releases/profiles. I have used paludis from some where around 0.2x 
can't remember excatly which one it was. A breakage now and then in the 
building stages is nothing new for me. Stability in my eyes is stability of 
the binaries in my system not so of the builds itself.

> > My system is an Intel quad core core2 with a 2.4 GHz clock speed coupled
> > with a 4GB of memory. No overclocking etc. Want this to be stable. :)
> >
> > I'm just curious what people use as their stable CFLAGS in amd64 Gentoo?
> > (Sorry if this has been up lately, but I just switched to 64bit env
> > so...)
> >
> >
> > Here is mine and some explanation of why (And I use ~arch system with gcc
> > 4.3)
> >
> > The flags are in order they are used in my CFLAGS and CXXFLAGS.
> >
> > Gives stable base
> > -O2
>
> yes
>
> > Want to optimize for my system, but don't want "native"
> > -march=core2
>
> ok
>
> > If some ebuilds filter march this will still cache optimize etc for my
> > system -mtune=core2
>
> I would scrap that.
>
> > Faster floating point math and better chance of vectorization
> > -mfpmath=sse
>
> superfluos. March with amd64 sse is used by default.

So it's set even if arch filter drop's arch to the lowest amd64 arch. Wasn't 
sure so stick it in as I want to be sure there are no FPU code around making 
life harder.

> > These because of the march might get filtered
> > -mmmx -msse -msse2 -msse3 -mssse3
>
> if march get filtered, these might one of the reasons, I would remove them.

From my experience all the bugs that needed arch filtering had something wrong 
in the generic optimizations enabled only when certain -Ox and -march 
combination had been used and not the use of the instruction sets. (Couple of 
beta gcc's excluded, but I'm not touching those anymore).

So I could scrap the older ones as march will allready cover those, except for 
the -msse3 which allows the compiler to use more SIMD instructions in loop 
vectorization.

> > For loop vectorization
> > -ftree-vectorize
>
> scrap that.

Why?
I read that there has been problems with it earlier, but to my experience it 
has been in the 32bit arch and In this system none what so ever.
And fof isolated packages I can always easily disable that as being a paludis 
user. By the way most of those tree-vectorizer problems come from the other 
optimizations used before tree-vectorizer like loop peeling, loop unrolling 
etc. 

> > -pipe
>
> once upon a time I used this flags:
>
> #CFLAGS="-march=k8 -O2 -pipe -fweb -ftracer -fpeel-loops -msse3"
> and even
> #CFLAGS="-march=k8 -O2 -fweb -ftracer -fpeel-loops -ftree-vectorize
> -frename- registers -floop-optimize2 -msse3 -pipe"
>
> to hunt down a java bug, I recompiled the whole system with:
>
> CFLAGS="-march=k8 -O2 -msse3 -pipe"
>
> and surprise - it was as fast as before - and compiling was faster too!

Was this a 64bit system?
I wouldn't use tree-vectorizer in a 32bit system as the alignment issues are a 
serious problem until gcc gets the proper stack alignment handling.

I wouldn't touch the other flags you used, but I also know what code 
reductions regular code can get from loop vectorizer. Although to get best out 
of vectorization one really has to write compact and loopy and maybe an odd 
looking code. Also there are need for a lot of improvement in the vectorizer 
as can be seen from the code generated for the joo2 function in my example.

For example:
float a[4];
float b[4];

float
joo() {
	a[0] = b[0]*b[0];
	a[1] = b[1]*b[1];
	a[2] = b[2]*b[2];
	a[3] = b[3]*b[3];
	return a[0]+a[1]+a[2]+a[3];
}

float
joo2() {
	int i;
	for( i=0; i<4; i++)
		a[i] = b[i]*b[i];
	return a[0]+a[1]+a[2]+a[3];
}

joo() will be slower using CFLAGS="-O2 -march=core2 -ftree-vectorize" than 
joo2(), because tree vectorizer can vectorize the constant loop out.
jopy the code to a c-source file like joo.c and execute:
gcc -O2 -march=core2 -ftree-vectorize -S joo.c && less joo.s

PS. For those who are interested: There are many issues of vectorizeable loops 
that can't be vectorized because gcc lacks proper parameter stack alignment. 
Which is the reason I wrote the example the way I did. :)

It can't provide nearly as many optimizations as in 64bit systems, because of 
the alignment issue. Tree-vectorizer makes a lot of those two version 
vectorizations when it needs to determine the memory alignment in runtime. 
That's why I take a closer look at the vectorizations. There were really few 
of those two version vectorizations when I compiled my "system"




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64]  Re: CFLAGS question from a AMD64 newbie
  2008-12-09 16:07 ` [gentoo-amd64] " Duncan
  2008-12-09 17:40   ` Branko Badrljica
@ 2008-12-09 20:34   ` Sami Näätänen
  2008-12-10  7:35     ` Duncan
  2008-12-16 23:00   ` Branko Badrljica
  2 siblings, 1 reply; 16+ messages in thread
From: Sami Näätänen @ 2008-12-09 20:34 UTC (permalink / raw
  To: gentoo-amd64

On Tuesday 09 December 2008 18:07:38 Duncan wrote:
> Sami Näätänen <sn.ml@keijukammari.fi> posted
> 200812091423.30562.sn.ml@keijukammari.fi, excerpted below, on  Tue, 09 Dec
>
> 2008 14:23:30 +0200:
> > My system is an Intel quad core core2 with a 2.4 GHz clock speed coupled
> > with a 4GB of memory. No overclocking etc. Want this to be stable. :)
> >
> > I'm just curious what people use as their stable CFLAGS in amd64 Gentoo?
> > (Sorry if this has been up lately, but I just switched to 64bit env
> > so...)
> >
> >
> > Here is mine and some explanation of why (And I use ~arch system with
> > gcc 4.3)
>
> Well, you say you want stable, but then say you use ~arch, so I see
> you're not too stick in the mud. =:^)

Well stable binaries as I said in my clarifying (at least a litle) second 
post. :)

> Here's mine, for a dual Opteron 290:
>
> CFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge-
> all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize -
> fdirectives-only -freorder-blocks-and-partition -combine"
>
> CXXFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge-
> all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize -
> fdirectives-only"
>
> You can look them up in the gcc manpage, or look back a year or so when I
> explained most of them, altho that was a couple gcc versions ago and they
> weren't quite the same.
>
> But my basic strategy is this:  Because memory is so much slower than
> cache on a modern processor, in general it should pay to optimize for
> size even if it costs a few CPU cycles once in awhile.  Thus, until
> fairly recently I used -Os, but with gcc-4.3, decided to switch to -O2
> since gcc is getting smarter about such optimizations with -O2 now, and
> the few additional size optimizations with -Os now tend to be at the
> expense of cache (think -freorder-blocks-and-partition).  In any case, I
> certainly don't want -O3 or too much loop unrolling and inlining, at the
> expense of cache.
>
> -frename-registers and -fweb are useful for taking advantage of the
> additional registers x86_64 has.  -fdirectives-only is there because it
> works better with ccache, which I use.  You know about -ftree-vectorize
> and -combine is discussed elsewhere on-thread.  -fmerge-all-constants
> isn't strictly C standard, but I've had absolutely zero issues with it,
> and it's going to help with cache.  -freorder-blocks-and-partition won't
> work on most C++ code, thus (along with -combine) the reason I split
> CFLAGS and CXXFLAGS, but it tells gcc to keep hot code together so it
> stays in cache better.  The various -fgcse-* options make gcc stricter
> about global common subexpression elimination (gcse) under various
> conditions.  This shouldn't add to size and may in fact reduce size by
> reducing instruction count (or moving it out of loops, size neutral), but
> it can increase compile time, the reason a few of them are enabled at -O3
> only, by default.
>
> -combine is the one that causes the most problems, handled per trouble-
> package as mentioned in the other thread using /etc/portage/env/* files.
> The -fredorder-blocks-and-partition can in some cases as well, but if you
> don't have either of those in CXXFLAGS, you'll avoid a lot of the problem
> right there.  Those are the only C(XX)FLAGS I have had issues with
> lately.  The others have worked just fine.
>
> With quad-core you will likely be interested in upping your MAKEOPTS job
> count as well.  Just be aware that it too can cause issues at times.
> Again, however, it's easily worked around per-package as you come across
> them using the env/* files to set MAKEOPTS=-j1 or whatever.

Yeah forgot to told that too. I in fact like to -j <num cores> as then There 
is no need for renicing in most cases and the system stays smooth. 

> Since you mentioned running ~arch, and assuming your PM is still portage,
> you may also want to take a look at the emerge's --jobs and --load-
> average options, for parallel emerges, if you haven't already.  If you
> use them you'll probably find --keep-going useful as well, so it doesn't
> stop just because one of the parallel merges failed.

Well paludis man for quite a while much better dependency handling.

> Finally, if you haven't already, consider pointing PORTAGE_TMPDIR at a
> tmpfs.  With 4 gig memory it should speed things up dramatically, and the
> worst-case is that it uses swap, sending to disk what would be 100%
> guaranteed to go to disk if you had PORTAGE_TMPDIR on disk.

Eah I have
3GB tmpfs for /var/tmp/paludis and
1GB tmpfs for /tmp to speed things up in normal operation. And as memory seems 
to be quite cheap I might change to 8GB. After all there is no such thing as 
too much memory... (Actually there can be, but then one has the wrong HW to 
use that memory ;) )




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64] CFLAGS question from a AMD64 newbie
  2008-12-09 19:59   ` Sami Näätänen
@ 2008-12-10  7:00     ` Branko Badrljica
  2008-12-10  7:27     ` Volker Armin Hemmann
  1 sibling, 0 replies; 16+ messages in thread
From: Branko Badrljica @ 2008-12-10  7:00 UTC (permalink / raw
  To: gentoo-amd64

Sami Näätänen wrote:

<SNIP about -ftree-vectorize>
> For example:
> float a[4];
> float b[4];
>   
<SNIPped the rest of example >


Nice one. And probably with stellar speedup, since bunch of code gets 
replaced with one or two SSE instructions.
But how relevant is it in real life examples ?





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64] CFLAGS question from a AMD64 newbie
  2008-12-09 19:59   ` Sami Näätänen
  2008-12-10  7:00     ` Branko Badrljica
@ 2008-12-10  7:27     ` Volker Armin Hemmann
  1 sibling, 0 replies; 16+ messages in thread
From: Volker Armin Hemmann @ 2008-12-10  7:27 UTC (permalink / raw
  To: gentoo-amd64

On Dienstag 09 Dezember 2008, Sami Näätänen wrote:

> Was this a 64bit system?

yes

also, most ebuilds don't filter march.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [gentoo-amd64]  Re: CFLAGS question from a AMD64 newbie
  2008-12-09 20:34   ` Sami Näätänen
@ 2008-12-10  7:35     ` Duncan
  0 siblings, 0 replies; 16+ messages in thread
From: Duncan @ 2008-12-10  7:35 UTC (permalink / raw
  To: gentoo-amd64

Sami Näätänen <sn.ml@keijukammari.fi> posted
200812092234.39964.sn.ml@keijukammari.fi, excerpted below, on  Tue, 09 Dec
2008 22:34:39 +0200:

> Eah I have
> 3GB tmpfs for /var/tmp/paludis and
> 1GB tmpfs for /tmp to speed things up in normal operation. And as memory
> seems to be quite cheap I might change to 8GB. After all there is no
> such thing as too much memory... (Actually there can be, but then one
> has the wrong HW to use that memory ;) )

I have 8 gig here, and for four cores and compiling to tmpfs it's 
reasonable.  I'd be reasonably happy with 4 gig as well, but as you, I 
figured why not go the 8 gig, so I did.  I /had/ thought about 16 gig, 
maxing out my system, but that would be overkill.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




^ permalink raw reply	[flat|nested] 16+ messages in thread

* [gentoo-amd64]  Re: CFLAGS question from a AMD64 newbie
  2008-12-16 23:00   ` Branko Badrljica
@ 2008-12-16 22:46     ` Duncan
  2008-12-17  2:00       ` Branko Badrljica
  0 siblings, 1 reply; 16+ messages in thread
From: Duncan @ 2008-12-16 22:46 UTC (permalink / raw
  To: gentoo-amd64

Branko Badrljica <brankob@avtomatika.com> posted
4948332A.3080109@avtomatika.com, excerpted below, on  Wed, 17 Dec 2008
00:00:58 +0100:

> Duncan wrote:
>> -combine is the one that causes the most problems, handled per trouble-
>> package as mentioned in the other thread using /etc/portage/env/*
>> files. The -fredorder-blocks-and-partition can in some cases as well,
>> but if you don't have either of those in CXXFLAGS, you'll avoid a lot
>> of the problem right there.  Those are the only C(XX)FLAGS I have had
>> issues with lately.  The others have worked just fine.
>>   
> Do you have link to info on that per-package environment mechanism ? I
> couldn't find anything...

I'm headed to work so don't have time to look ATM, but AFAIK I either saw 
it discussed on the portage-dev list or found it in the form of a 
changelog entry, likely referring to the bug number where it was 
requested, then later added.

This is one of the features they don't like to talk about too much, 
because emerge --info says nothing about it.  Basically, they don't trust 
their users to mention it when reporting bugs (and, honestly, it's hard 
to remember what you might have put there a year and two release versions 
ago...).

The feature is somewhat limited, in that these files are sourced in the 
bash portion of portage (basically, ebuild.sh) -- the python portion 
knows nothing about them.  Therefore, the env files can only affect stuff 
like cflags that are normally processed in bash.  It doesn't affect for 
instance some of the FEATURES that the python portion (basically the 
dependency checking and the part that glues all the phases together, see 
the ebuild (1) manpage for the normal phases, and ebuild (5) manpage for 
how they are expressed in the ebuilds as functions) uses and controls.

Starting from /etc/portage/env, you create the category as a subdir, then 
the package (which can include the version, but doesn't take the full 
atom syntax) as the filename.

Inside the file you can do things like:

# comments
MAKEOPTS=-j1
CFLAGS="${CFLAGS/ -combine/}"

(See standard bash rules for how that cflags thing is interpreted, but 
it's like a sed substitution.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64]  Re: CFLAGS question from a AMD64 newbie
  2008-12-09 16:07 ` [gentoo-amd64] " Duncan
  2008-12-09 17:40   ` Branko Badrljica
  2008-12-09 20:34   ` Sami Näätänen
@ 2008-12-16 23:00   ` Branko Badrljica
  2008-12-16 22:46     ` Duncan
  2 siblings, 1 reply; 16+ messages in thread
From: Branko Badrljica @ 2008-12-16 23:00 UTC (permalink / raw
  To: gentoo-amd64

Duncan wrote:
> -combine is the one that causes the most problems, handled per trouble-
> package as mentioned in the other thread using /etc/portage/env/* files.  
> The -fredorder-blocks-and-partition can in some cases as well, but if you 
> don't have either of those in CXXFLAGS, you'll avoid a lot of the problem 
> right there.  Those are the only C(XX)FLAGS I have had issues with 
> lately.  The others have worked just fine.
>   
Do you have link to info on that per-package environment mechanism ?
I couldn't find anything...






^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64]  Re: CFLAGS question from a AMD64 newbie
  2008-12-16 22:46     ` Duncan
@ 2008-12-17  2:00       ` Branko Badrljica
  2008-12-17  8:41         ` Duncan
  0 siblings, 1 reply; 16+ messages in thread
From: Branko Badrljica @ 2008-12-17  2:00 UTC (permalink / raw
  To: gentoo-amd64

Duncan wrote:

> The feature is somewhat limited, in that these files are sourced in the 
> bash portion of portage (basically, ebuild.sh) -- the python portion 
> knows nothing about them.  
GREAT !
Can I use them for switching gcc compiler etc, not just for simple
variable settings ?

And btw, is there a way to switch gcc back after emerge without changing
ebuild file ?

Or maybe there is some other mechanism to be able to compile package
with particular gcc ?

One more question:

What if two or more files match the package ?

Like:

env/sys-kernel/vanilla-sources
env/sys-kernel/vanilla-sources-2
env/sys-kernel/vanilla-sources2.6
env/sys-kernel/vanilla-sources-2.6.27

Do they all get included in some /un/specified order or just
first/last/unspecified one ?


branko



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [gentoo-amd64]  Re: CFLAGS question from a AMD64 newbie
  2008-12-17  2:00       ` Branko Badrljica
@ 2008-12-17  8:41         ` Duncan
  0 siblings, 0 replies; 16+ messages in thread
From: Duncan @ 2008-12-17  8:41 UTC (permalink / raw
  To: gentoo-amd64

Branko Badrljica <brankob@avtomatika.com> posted
49485D48.9000001@avtomatika.com, excerpted below, on  Wed, 17 Dec 2008
03:00:40 +0100:

> Duncan wrote:
> 
>> The feature is somewhat limited, in that these files are sourced in the
>> bash portion of portage (basically, ebuild.sh) -- the python portion
>> knows nothing about them.
> GREAT !
> Can I use them for switching gcc compiler etc, not just for simple
> variable settings ?

I've never tried anything but variable settings, but it's sourced as 
bash, so you should be able to do pretty much anything you could do in a 
bash script, so yes, gcc-config ..., etc, should work... again, for the 
bash side only.

I'm not sure how much the python might depend on gcc, etc, however.  I do 
know that any deps (say if the package is in fortran or java and it tests 
gcc to see if it was merged with the appropriate USE flags) are resolved 
in the python bit, in emerge itself after it has sourced the ebuilds for 
deps, but before it actually starts on any of the ebuild phases.  Thus, 
if you go messing with that on a package where it's a dependency, you're 
likely to have "undesired results".  However, I'm not sure if portage/
emerge depends on gcc and/or anything else staying stable beneath it or 
not.

There's also the parallel merge ability in newer portage to think about.  
If you use it and invoke one of these env scripts, who knows /what/ stage 
some other package might be at in parallel merge!  That'd DEFINITELY be 
an invitation to trouble, since gcc-config switches things system-wide, 
and switching versions of gcc in the middle of compiling say glibc could 
literally bring the entire system crashing down around your ears when it 
tries to qmerge that half-n-half glibc to the running system!

Thus I'd certainly recommend sticking to stuff that's going to affect 
only that single package, setting environment variables, etc.  Don't do 
anything that will affect the system globally unless you are seriously 
looking for trouble.

What I'd suggest, if you want to push the boundaries, is to either just 
try it and see what you can get away with, or go looking in the forums 
and etc.  There's gotta be a thread or two on the subject.  You could 
also see if there's anything on the gentoo-user list archive (I use 
gmane, Gentoo has its own archive, and I think there's at least one more 
archive, Mark's or some such) about it, and/or ask there.  There's some 
pretty knowledgeable people out there -- way more than just what I know 
on it.

> And btw, is there a way to switch gcc back after emerge without changing
> ebuild file ?

Well, yes, but you're getting yet another layer deeper into the advanced 
stuff.  I haven't played around with them myself directly, but there are 
pre-phase and post-phase hooks for each phase.  Presumably you'd stick 
the switch back in the post-compile-hook (and I'm not sure I got the name 
exactly right) routine.  More below...

> Or maybe there is some other mechanism to be able to compile package
> with particular gcc ?

This doesn't directly answer that question, but it should give you a 
better idea the sort of flexibility available...

In addition to the per-package env files, there's the general 
/etc/portage/bashrc (see the file list and descriptions in
man (5) portage).  This is actually cascaded in a specific order (which 
IDR ATM) with the profile.bashrc(s) at the various levels of your 
(cascading) profile.  Among other things, the Gentoo/amd64 project was I 
believe the first to use this method to filter invalid cflags (say you 
use gcc-4.3.x normally, with certain cflags only valid from it on, and 
gcc-config back to a gcc-4.2.x version that doesn't recognize these 
cflags, one of the bashrc layers filters the invalid ones).  Take a look 
at the various *bashrc files in the profiles to get an idea of what 
Gentoo is already using them for!  (Actually, I believe the env files 
were first implemented this way but at a global not profile level, tho in 
newer portage they may be called directly.)

There's actually some pretty advanced bashrc and phase hooks 
implementations around.  Ed Catmur, the guy who wrote udept, has a nice 
one that allows all sorts of interesting stuff.  Unfortunately, he seems 
to have disappeared (and udept is in the process of being masked since 
there's no upstream any more and it doesn't support the latest EAPIs, 
I've no idea if he simply disappeared without warning or if he got in an 
argument with a Gentoo dev or what, but the Gentoo stuff on his site now 
says it needs a password, which could as easily mean it doesn't exist 
there anymore at all), but I have a copy of his tarballs from a year or 
so ago.

I use only one of the several features his bashrc and phase hooks scripts 
made possible, FEATURES=patchtree.  The idea and implementation is 
similar to but not identical to the env tree, in this case, a
category/package based tree under /etc/portage/patches/, only the package 
is in this case another subdir instead of a file as it is with the env 
tree.  

Here's an example of how I use it.  As I regularly install and set as my 
normal system gcc still-hard-masked versions, I'm used to checking bugs 
for patches to enable various packages to compile with the newer gcc, 
well before the patches actually find their way into the tree.  If I can 
find an appropriate patch, I can dump the patch file into an 
appropriately versioned patches/category/package-ver subdir, and with his 
scripts in place and that feature turned on, portage will automatically 
apply the patch post-unpack-phase.  His scripts are even smart enough to 
determine in many cases whether auto-conf, etc, needs rerun, and handle 
it automatically, altho they don't handle every possibility.  While I 
used to nearly always have at least a half dozen ebuilds in my overlay, 
with various patches and etc applied so they'd build with the latest gcc 
or glibc or whatever, now I just dump the patches in an appropriately 
created subdir and let portage and Ed's FEATURES=patchtree handle it!  
Between that and the env tree, I seldom have to carry patched ebuilds in 
my overlay any more, and it's often empty and seldom has more than a 
couple more complex case ebuilds in it.

Another feature I do NOT use from his scripts, is 
FEATURES=installsources, which keeps the sources around for reference.  I 
think a number of Gentoo devs are actually using this feature, so they 
can keep sources around for what they're working on.  I think it can be 
toggled per package but as I said I don't use it so I haven't looked too 
deeply into how it works.

There's some more, as well.  IDR the details, but looked at them and 
decided I didn't need them here.

If you're interested in the tarballs, maybe even only to take a look at 
for ideas to see what's actually possible, ask, and I'll mail them to 
you.  Of course with him apparently disappeared and no longer involved 
with Gentoo, if/when they eventually start to break, you'll need to deal 
with that, but they're working fine so far with the portage-2.2-rcs I've 
been running, and I think the portage hooks they use are intended to stay 
around for awhile, so if it breaks, it'll probably be inadvertent (as I 
said I believe a number of devs use those scripts...), and a portage bug 
report and/or a bit of bash hacking should hopefully get it working again.

> One more question:
> 
> What if two or more files match the package ?
> 
> Like:
> 
> env/sys-kernel/vanilla-sources
> env/sys-kernel/vanilla-sources-2
> env/sys-kernel/vanilla-sources2.6
> env/sys-kernel/vanilla-sources-2.6.27
> 
> Do they all get included in some /un/specified order or just
> first/last/unspecified one ?

I don't remember the specific details, in part because as I described, I 
have the patch tree to think about as well, and I never remember the 
versioning formats for either.  I know the first and the last example 
above should work, and 2.6.27 would work for -rX versions as well (found 
that out the other day, the ~ and = prefixes don't work, but just the 
version applies for all revisions, too), but generally get confused and 
often end up simply trying it until I get it to work, for the odd cases.

Matter of fact, I spent a bit of time looking for documentation on 
exactly what version format to use for env files myself, the other day, 
but didn't find it.  That's when I discovered the versioning info 
mentioned above, simply by testing until I got it working.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2008-12-17  8:41 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-09 12:23 [gentoo-amd64] CFLAGS question from a AMD64 newbie Sami Näätänen
2008-12-09 13:13 ` Martin Herrman
2008-12-09 14:15   ` Branko Badrljica
2008-12-09 15:33     ` [gentoo-amd64] " Duncan
2008-12-09 13:28 ` [gentoo-amd64] " Volker Armin Hemmann
2008-12-09 19:59   ` Sami Näätänen
2008-12-10  7:00     ` Branko Badrljica
2008-12-10  7:27     ` Volker Armin Hemmann
2008-12-09 16:07 ` [gentoo-amd64] " Duncan
2008-12-09 17:40   ` Branko Badrljica
2008-12-09 20:34   ` Sami Näätänen
2008-12-10  7:35     ` Duncan
2008-12-16 23:00   ` Branko Badrljica
2008-12-16 22:46     ` Duncan
2008-12-17  2:00       ` Branko Badrljica
2008-12-17  8:41         ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox