* Re: [gentoo-user] cpu flags / USE flags / compiler flags
2005-09-04 3:27 ` waltdnes
@ 2005-09-04 5:18 ` Mark Knecht
2005-09-04 6:11 ` ellotheth rimmwen
2005-09-04 6:21 ` Volker Armin Hemmann
2 siblings, 0 replies; 14+ messages in thread
From: Mark Knecht @ 2005-09-04 5:18 UTC (permalink / raw
To: gentoo-user
Thanks Walter. That description verifies my guess and gives me a
reason to continue looking at the issue.
I appreciate your help.
Cheers,
Mark
On 9/3/05, waltdnes@waltdnes.org <waltdnes@waltdnes.org> wrote:
> On Wed, Aug 31, 2005 at 09:04:21AM -0700, Mark Knecht wrote
>
> > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> > mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
> > xtpr
> [...deletia...]
> > I then looked for CPU flags that had an equivalent USE flag and that
> > might be of use for faster graphics. On this machine I chose mmx, sse
> > & sse2. Armed with that I changed my make.conf file to look like this:
>
> There are CPU flags and there are USE flags. Some of them have the
> same names, and that may confuse you. It works like this...
> 1) Get a listing of your cpu's flags in /proc/cpuinfo
> 2) Check against the list of supported flags in gcc for you cpu, and
> add them to CFLAGS
> 3) Check http://www.gentoo.org/dyn/use-index.xml for a list of valid
> USE flags, and include any that show up in /proc/cpuinfo
> 4) Repeat step 3) with /usr/portage/profiles/use.local.desc for any
> programs you're emerging. There doesn't seem to be anything
> special on your pentium4, but my AMD64 not only has mmx and 3dnow,
> it also has mmxext and 3dnowext. mplayer can take advantage of
> them. I include them in the /etc/portage/package.use entry for
> media-video/mplayer.
>
> I'll assume that you're using gcc 3.3.5. In that case, the place to
> look for CPU flag options is...
>
> http://gcc.gnu.org/onlinedocs/gcc-3.3.5/gcc/i386-and-x86_002d64-Options.html#i386-and-x86_002d64-Options
>
> That list shows pentium4, mmx, sse, and sse2. Also, if you have *ANY*
> version of sse available, you can improve performance by running floating
> point math via sse, rather than 387 instructions. I recommend...
>
> CFLAGS="-O2 -pipe -fomit-frame-pointer -march=pentium4 -mmmx -msse -msse2 -mfpmath=sse"
>
> http://www.gentoo.org/dyn/use-index.xml shows mmx and sse as valid USE
> flags, so you can include them in USE.
>
> --
> Walter Dnes <waltdnes@waltdnes.org>
> My musings on technology and security at http://tech_sec.blog.ca
> --
> gentoo-user@gentoo.org mailing list
>
>
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] cpu flags / USE flags / compiler flags
2005-09-04 3:27 ` waltdnes
2005-09-04 5:18 ` Mark Knecht
@ 2005-09-04 6:11 ` ellotheth rimmwen
2005-09-04 6:21 ` Volker Armin Hemmann
2 siblings, 0 replies; 14+ messages in thread
From: ellotheth rimmwen @ 2005-09-04 6:11 UTC (permalink / raw
To: gentoo-user
Hm. Clear, brief, instructive. Smells a lot like a mini-HOWTO.
On 9/3/05, waltdnes@waltdnes.org <waltdnes@waltdnes.org> wrote:
> There are CPU flags and there are USE flags. Some of them have the
> same names, and that may confuse you. It works like this...
> 1) Get a listing of your cpu's flags in /proc/cpuinfo
> 2) Check against the list of supported flags in gcc for you cpu, and
> add them to CFLAGS
> 3) Check http://www.gentoo.org/dyn/use-index.xml for a list of valid
> USE flags, and include any that show up in /proc/cpuinfo
> 4) Repeat step 3) with /usr/portage/profiles/use.local.desc for any
> programs you're emerging. There doesn't seem to be anything
> special on your pentium4, but my AMD64 not only has mmx and 3dnow,
> it also has mmxext and 3dnowext. mplayer can take advantage of
> them. I include them in the /etc/portage/package.use entry for
> media-video/mplayer.
>
> I'll assume that you're using gcc 3.3.5. In that case, the place to
> look for CPU flag options is...
>
> http://gcc.gnu.org/onlinedocs/gcc-3.3.5/gcc/i386-and-x86_002d64-Options.html#i386-and-x86_002d64-Options
>
> That list shows pentium4, mmx, sse, and sse2. Also, if you have *ANY*
> version of sse available, you can improve performance by running floating
> point math via sse, rather than 387 instructions. I recommend...
>
> CFLAGS="-O2 -pipe -fomit-frame-pointer -march=pentium4 -mmmx -msse -msse2 -mfpmath=sse"
>
> http://www.gentoo.org/dyn/use-index.xml shows mmx and sse as valid USE
> flags, so you can include them in USE.
>
> --
> Walter Dnes <waltdnes@waltdnes.org>
> My musings on technology and security at http://tech_sec.blog.ca
> --
> gentoo-user@gentoo.org mailing list
>
>
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] cpu flags / USE flags / compiler flags
2005-09-04 3:27 ` waltdnes
2005-09-04 5:18 ` Mark Knecht
2005-09-04 6:11 ` ellotheth rimmwen
@ 2005-09-04 6:21 ` Volker Armin Hemmann
2005-09-04 12:45 ` Mark Knecht
2005-09-05 1:59 ` waltdnes
2 siblings, 2 replies; 14+ messages in thread
From: Volker Armin Hemmann @ 2005-09-04 6:21 UTC (permalink / raw
To: gentoo-user
On Sunday 04 September 2005 05:27, waltdnes@waltdnes.org wrote:
>
> That list shows pentium4, mmx, sse, and sse2. Also, if you have *ANY*
> version of sse available, you can improve performance by running floating
> point math via sse, rather than 387 instructions. I recommend...
>
> CFLAGS="-O2 -pipe -fomit-frame-pointer -march=pentium4 -mmmx -msse -msse2
> -mfpmath=sse"
>
>
emm. I would not do this.
-mfpmath=sse seems to be slower than -fpmath=387
http://www.anandtech.com/mac/showdoc.aspx?i=2436&p=5
has the numbers/made the experience.
It seems, that gcc is not he best optimizer in the world ;)
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] cpu flags / USE flags / compiler flags
2005-09-04 6:21 ` Volker Armin Hemmann
@ 2005-09-04 12:45 ` Mark Knecht
2005-09-05 1:59 ` waltdnes
1 sibling, 0 replies; 14+ messages in thread
From: Mark Knecht @ 2005-09-04 12:45 UTC (permalink / raw
To: gentoo-user
I agree with Ellotheth that it seems like there's an opportunity to
come up with a good optimization doc but the paper is interesting. The
answers might not be the same for P4 vs. AMD vs. sparc vs. Apple.
Maybe a suite of files that get compiled, generate the numbers and
instruct you what might work best?
Interesting info. thanks.
- Mark
On 9/3/05, Volker Armin Hemmann <volker.armin.hemmann@tu-clausthal.de> wrote:
> On Sunday 04 September 2005 05:27, waltdnes@waltdnes.org wrote:
>
> >
> > That list shows pentium4, mmx, sse, and sse2. Also, if you have *ANY*
> > version of sse available, you can improve performance by running floating
> > point math via sse, rather than 387 instructions. I recommend...
> >
> > CFLAGS="-O2 -pipe -fomit-frame-pointer -march=pentium4 -mmmx -msse -msse2
> > -mfpmath=sse"
> >
> >
>
> emm. I would not do this.
>
>
> -mfpmath=sse seems to be slower than -fpmath=387
>
> http://www.anandtech.com/mac/showdoc.aspx?i=2436&p=5
>
> has the numbers/made the experience.
>
> It seems, that gcc is not he best optimizer in the world ;)
> --
> gentoo-user@gentoo.org mailing list
>
>
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] cpu flags / USE flags / compiler flags
2005-09-04 6:21 ` Volker Armin Hemmann
2005-09-04 12:45 ` Mark Knecht
@ 2005-09-05 1:59 ` waltdnes
2005-09-06 3:25 ` Bob Sanders
1 sibling, 1 reply; 14+ messages in thread
From: waltdnes @ 2005-09-05 1:59 UTC (permalink / raw
To: gentoo-user
On Sun, Sep 04, 2005 at 08:21:47AM +0200, Volker Armin Hemmann wrote
> On Sunday 04 September 2005 05:27, waltdnes@waltdnes.org wrote:
> > CFLAGS="-O2 -pipe -fomit-frame-pointer -march=pentium4 -mmmx -msse -msse2
> > -mfpmath=sse"
>
> emm. I would not do this.
>
>
> -mfpmath=sse seems to be slower than -fpmath=387
>
> http://www.anandtech.com/mac/showdoc.aspx?i=2436&p=5
>
> has the numbers/made the experience.
>
> It seems, that gcc is not he best optimizer in the world ;)
I've read through the article, and there are a couple of interesting
items in it...
1) The bit about sse being slower than 387 only applies to the brand
new Xeon Irwindale.
2) The brand new 3.6 ghz Xeon Irwindale ran slower than the older 3.06
ghz Xeon Galatin.
That leads to one of two possible conclusions...
Really Bad) The Irwindale is at least lame if not totally b0rk3n.
Not so Bad) The Irwindale is so new that the gcc developers haven't
had an opportunity to implement optimizations for it.
In either case, I wouldn't want to extrapolate Xeon Irwindale results
to all Intel X86 chips, let alone AMD. /usr/portage/app-benchmarks has
several items in it. Does anybody know which ones have floating-point
tests?
Tinfoil-hat-theory... have you noticed that Microsoft just loves to
use Xeons, especially dual-Xeons, in their "get the facts" propaganda?
I wonder if they've found a problem with gcc's optimizations for Xeon,
and are exploiting that problem to bias all their comparisons.
--
Walter Dnes <waltdnes@waltdnes.org>
My musings on technology and security at http://tech_sec.blog.ca
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] cpu flags / USE flags / compiler flags
2005-09-05 1:59 ` waltdnes
@ 2005-09-06 3:25 ` Bob Sanders
2005-09-06 3:35 ` Mark Knecht
0 siblings, 1 reply; 14+ messages in thread
From: Bob Sanders @ 2005-09-06 3:25 UTC (permalink / raw
To: gentoo-user
On Sun, 4 Sep 2005 21:59:10 -0400
waltdnes@waltdnes.org wrote:
> In either case, I wouldn't want to extrapolate Xeon Irwindale results
> to all Intel X86 chips, let alone AMD. /usr/portage/app-benchmarks has
> several items in it. Does anybody know which ones have floating-point
> tests?
>
There are many floating-point tests. You choose by what you want to prove.
And an example -
[MN] sys-cluster/hpl (1.0-r2): HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers
Linpack is pretty standard but only compare linpack results to linpack results. AIM5 and AIM7 a
different set. SpecFP, yet a different set.
Floating point tests are meaningless outside of themselves. If your apps happens to run
the same type of setup as a specific floating point test, then there is meaning. If you
app has a lot of other things going on, no floating point test is going be give you an idea of
how the app is going to perform.
> Tinfoil-hat-theory... have you noticed that Microsoft just loves to
> use Xeons, especially dual-Xeons, in their "get the facts" propaganda?
> I wonder if they've found a problem with gcc's optimizations for Xeon,
> and are exploiting that problem to bias all their comparisons.
>
No. nothing as creative as that. It's well known that Intel's C/C++ compiler is better at
some things than others. Microsoft, probably, just happens to use Intel's compiler
for WinXX while "forgetting" to use it in place of gcc.
If you want to prove that Opterons are faster than Xeons, you'll buy a copy of the PathScale
compiler for the Opterons and use Intel's compiler for the Xeons.
Bob
-
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] cpu flags / USE flags / compiler flags
2005-09-06 3:25 ` Bob Sanders
@ 2005-09-06 3:35 ` Mark Knecht
2005-09-06 4:04 ` Bob Sanders
0 siblings, 1 reply; 14+ messages in thread
From: Mark Knecht @ 2005-09-06 3:35 UTC (permalink / raw
To: gentoo-user
On 9/5/05, Bob Sanders <rmsand@concentric.net> wrote:
> If you want to prove that Opterons are faster than Xeons, you'll buy a copy of the PathScale
> compiler for the Opterons and use Intel's compiler for the Xeons.
>
> Bob
Bob,
I don't think this was ever the point. The question was: "For this
specific machine what would be the best flags?"
I have a specific revision of the AMD64 process. What flags should
I use? Possibly some sort of test could compile lots of things, look
at numbers, and allow me to make a quantitive decision instead of just
shooting in the dark.
- Mark
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] cpu flags / USE flags / compiler flags
2005-09-06 3:35 ` Mark Knecht
@ 2005-09-06 4:04 ` Bob Sanders
0 siblings, 0 replies; 14+ messages in thread
From: Bob Sanders @ 2005-09-06 4:04 UTC (permalink / raw
To: gentoo-user
On Mon, 5 Sep 2005 20:35:10 -0700
Mark Knecht <markknecht@gmail.com> wrote:
> Bob,
> I don't think this was ever the point. The question was: "For this
> specific machine what would be the best flags?"
>
You;ll hate this - it depends on what your main apps do. Are they i/o intensive,
compute intensive - more integer, specific FP instruction set? Small enough
to fit into L2 cache, do lots of branching, multithreaded?
> I have a specific revision of the AMD64 process. What flags should
> I use? Possibly some sort of test could compile lots of things, look
> at numbers, and allow me to make a quantitive decision instead of just
> shooting in the dark.
>
If you don't have a contained set of apps that represent a set of conditions that can
be specifically defined, no benchmark is going to give a correct answer. In other
words - your running a general purpose desktop, then there is no specific set of flags
that will optimize everything.
There is a set of AMD optimized strings that is being put into glibc. But it won't be for
awhile - it does significant breakage to nano. And maybe to other apps. No specific
compiler flags will be required for this optimization to happen - it will double memcopy
speed. And that alone will provide a significant increase in performance with just a
recompile - more performance than is obtainable by a set of flags.
Finally, what is done by the people dropping US$100K to US$1M, they take the app they
are going to run an test with that. They don't rely on benchmarks. Figure out what it
is that will be your primary app. Find out how to get performance measurements on it -
run sar if you have to. Change the compiler flags and re-run. Look for the bottlenecks
and work to eliminate them. See -
[ N] app-admin/sysstat (5.0.5-r2): System performance tools for Linux
Bob
-
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread