Re: [gentoo-amd64] CFLAGS question from a AMD64 newbie

public inbox for gentoo-amd64@lists.gentoo.org
 help / color / mirror / Atom feed

From: "Sami Näätänen" <sn.ml@keijukammari.fi>
To: gentoo-amd64@lists.gentoo.org
Subject: Re: [gentoo-amd64] CFLAGS question from a AMD64 newbie
Date: Tue, 9 Dec 2008 21:59:12 +0200	[thread overview]
Message-ID: <200812092159.13048.sn.ml@keijukammari.fi> (raw)
In-Reply-To: <200812091428.21434.volker.armin.hemmann@tu-clausthal.de>

On Tuesday 09 December 2008 15:28:21 Volker Armin Hemmann wrote:
> On Dienstag 09 Dezember 2008, Sami Näätänen wrote:
> > So hi from a amd64 newbie. Not so newbie with Gentoo though. :)

Well sorry to not give a more details. I'm not a newbie in Gentoo just in the 
amd64 side of things. Ie no experience of the bugs how things break in the 
tree if using one or the other etc.

So I take this now with a litle bit of more detail.

I have been hanging with Gentoo before the 1.4 days ie long before the yearly 
tagged releases/profiles. I have used paludis from some where around 0.2x 
can't remember excatly which one it was. A breakage now and then in the 
building stages is nothing new for me. Stability in my eyes is stability of 
the binaries in my system not so of the builds itself.

> > My system is an Intel quad core core2 with a 2.4 GHz clock speed coupled
> > with a 4GB of memory. No overclocking etc. Want this to be stable. :)
> >
> > I'm just curious what people use as their stable CFLAGS in amd64 Gentoo?
> > (Sorry if this has been up lately, but I just switched to 64bit env
> > so...)
> >
> >
> > Here is mine and some explanation of why (And I use ~arch system with gcc
> > 4.3)
> >
> > The flags are in order they are used in my CFLAGS and CXXFLAGS.
> >
> > Gives stable base
> > -O2
>
> yes
>
> > Want to optimize for my system, but don't want "native"
> > -march=core2
>
> ok
>
> > If some ebuilds filter march this will still cache optimize etc for my
> > system -mtune=core2
>
> I would scrap that.
>
> > Faster floating point math and better chance of vectorization
> > -mfpmath=sse
>
> superfluos. March with amd64 sse is used by default.

So it's set even if arch filter drop's arch to the lowest amd64 arch. Wasn't 
sure so stick it in as I want to be sure there are no FPU code around making 
life harder.

> > These because of the march might get filtered
> > -mmmx -msse -msse2 -msse3 -mssse3
>
> if march get filtered, these might one of the reasons, I would remove them.

From my experience all the bugs that needed arch filtering had something wrong 
in the generic optimizations enabled only when certain -Ox and -march 
combination had been used and not the use of the instruction sets. (Couple of 
beta gcc's excluded, but I'm not touching those anymore).

So I could scrap the older ones as march will allready cover those, except for 
the -msse3 which allows the compiler to use more SIMD instructions in loop 
vectorization.

> > For loop vectorization
> > -ftree-vectorize
>
> scrap that.

Why?
I read that there has been problems with it earlier, but to my experience it 
has been in the 32bit arch and In this system none what so ever.
And fof isolated packages I can always easily disable that as being a paludis 
user. By the way most of those tree-vectorizer problems come from the other 
optimizations used before tree-vectorizer like loop peeling, loop unrolling 
etc. 

> > -pipe
>
> once upon a time I used this flags:
>
> #CFLAGS="-march=k8 -O2 -pipe -fweb -ftracer -fpeel-loops -msse3"
> and even
> #CFLAGS="-march=k8 -O2 -fweb -ftracer -fpeel-loops -ftree-vectorize
> -frename- registers -floop-optimize2 -msse3 -pipe"
>
> to hunt down a java bug, I recompiled the whole system with:
>
> CFLAGS="-march=k8 -O2 -msse3 -pipe"
>
> and surprise - it was as fast as before - and compiling was faster too!

Was this a 64bit system?
I wouldn't use tree-vectorizer in a 32bit system as the alignment issues are a 
serious problem until gcc gets the proper stack alignment handling.

I wouldn't touch the other flags you used, but I also know what code 
reductions regular code can get from loop vectorizer. Although to get best out 
of vectorization one really has to write compact and loopy and maybe an odd 
looking code. Also there are need for a lot of improvement in the vectorizer 
as can be seen from the code generated for the joo2 function in my example.

For example:
float a[4];
float b[4];

float
joo() {
	a[0] = b[0]*b[0];
	a[1] = b[1]*b[1];
	a[2] = b[2]*b[2];
	a[3] = b[3]*b[3];
	return a[0]+a[1]+a[2]+a[3];
}

float
joo2() {
	int i;
	for( i=0; i<4; i++)
		a[i] = b[i]*b[i];
	return a[0]+a[1]+a[2]+a[3];
}

joo() will be slower using CFLAGS="-O2 -march=core2 -ftree-vectorize" than 
joo2(), because tree vectorizer can vectorize the constant loop out.
jopy the code to a c-source file like joo.c and execute:
gcc -O2 -march=core2 -ftree-vectorize -S joo.c && less joo.s

PS. For those who are interested: There are many issues of vectorizeable loops 
that can't be vectorized because gcc lacks proper parameter stack alignment. 
Which is the reason I wrote the example the way I did. :)

It can't provide nearly as many optimizations as in 64bit systems, because of 
the alignment issue. Tree-vectorizer makes a lot of those two version 
vectorizations when it needs to determine the memory alignment in runtime. 
That's why I take a closer look at the vectorizations. There were really few 
of those two version vectorizations when I compiled my "system"

next prev parent reply	other threads:[~2008-12-09 19:59 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-09 12:23 [gentoo-amd64] CFLAGS question from a AMD64 newbie Sami Näätänen
2008-12-09 13:13 ` Martin Herrman
2008-12-09 14:15   ` Branko Badrljica
2008-12-09 15:33     ` [gentoo-amd64] " Duncan
2008-12-09 13:28 ` [gentoo-amd64] " Volker Armin Hemmann
2008-12-09 19:59   ` Sami Näätänen [this message]
2008-12-10  7:00     ` Branko Badrljica
2008-12-10  7:27     ` Volker Armin Hemmann
2008-12-09 16:07 ` [gentoo-amd64] " Duncan
2008-12-09 17:40   ` Branko Badrljica
2008-12-09 20:34   ` Sami Näätänen
2008-12-10  7:35     ` Duncan
2008-12-16 23:00   ` Branko Badrljica
2008-12-16 22:46     ` Duncan
2008-12-17  2:00       ` Branko Badrljica
2008-12-17  8:41         ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200812092159.13048.sn.ml@keijukammari.fi \
    --to=sn.ml@keijukammari.fi \
    --cc=gentoo-amd64@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox