* [gentoo-science] [Fwd: [atlas-devel] 3.7.14 out]
@ 2006-08-18 4:13 M. Edward (Ed) Borasky
2006-08-18 12:33 ` Markus Dittrich
0 siblings, 1 reply; 2+ messages in thread
From: M. Edward (Ed) Borasky @ 2006-08-18 4:13 UTC (permalink / raw
To: gentoo-science
[-- Attachment #1: Type: text/plain, Size: 0 bytes --]
[-- Attachment #2: [atlas-devel] 3.7.14 out --]
[-- Type: message/rfc822, Size: 8162 bytes --]
From: Clint Whaley <whaley@cs.utsa.edu>
To: math-atlas-devel@lists.sourceforge.net
Subject: [atlas-devel] 3.7.14 out
Date: Thu, 17 Aug 2006 21:16:29 -0500
Message-ID: <200608180216.k7I2GTOE011785@pandora2.cs.utsa.edu>
Guys,
OK, 3.7.14 is out. Here's the changelog:
* Fixes/updates to ATLAS config system:
- Improved cpu throttling probe
- Added compiler test so only compilers that work are chosen from defaults
- Added simple C interoperation test
- Fixed frontend/backend tmpnam collision prob (config[1,0].tmp)
- Re-enabled parallel make support
- Fixed buildinfo support
- Added clock speed probe to config
- Enabled "make time" to produce performance summary!
- Added "make check" as alias to "make test" to make more like gnu
--- OOOps, this isn't there, screwed it up, sorry Kate --- RCW
- Fixed error in -Si nof77 1, which caused config to die w/o f77 compiler
* Added new arch defaults for P4E[32,64]SSE3 and HAMMER64SSE3, which get
better performance for gcc 4.2 (perf should still be OK for gcc 3).
I'm guessing anybody with x86/linux or freebsd/OSX should be able to use
this guy, though I've only get arch defaults for the above 3 systems. Note
that if you get the newest gcc from SVN, you can get 93% of double precision
theoretical peak using the code generator and the x87 unit on an AMD64!
The big news there is that the final part of the new
install is in place -- "make time" works! Here's what happens if you do
an install yourself, or if your arch defaults don't have benchmark data:
*******************************************************************************
NAMING ABBREVIATIONS:
kSelMM : selected matmul kernel (may be hand-tuned)
kGenMM : generated matmul kernel
kMM_NT : worst no-copy kernel
kMM_TN : best no-copy kernel
BIG_MM : large GEMM timing (usually N=1600); estimate of asymptotic peak
kMV_N : NoTranspose matvec kernel
kMV_T : Transpose matvec kernel
kGER : GER (rank-1 update) kernel
Kernel routines are not called by the user directly, and their
performance is often somewhat different than the total
algorithm (eg, dGER perf may differ from dkGER)
Clock rate=2800Mhz
single precision double precision
********************* ********************
real complex real complex
Benchmark % Clock % Clock % Clock % Clock
========= ========= ========= ========= =========
kSelMM 304.9 291.2 172.7 167.0
kGenMM 94.0 92.7 89.6 89.0
kMM_NT 75.2 75.6 67.9 70.5
kMM_TN 90.9 89.8 79.3 80.5
BIG_MM 290.1 275.0 164.6 162.2
kMV_N 33.8 129.3 32.7 43.1
kMV_T 49.3 46.4 30.8 44.7
kGER 34.3 66.2 17.0 32.9
*******************************************************************************
*******************************************************************************
Here's what happens if you have arch default data to compare against:
single precision double precision
******************************** *******************************
real complex real complex
--------------- --------------- --------------- ---------------
Benchmark Refrenc Present Refrenc Present Refrenc Present Refrenc Present
========= ======= ======= ======= ======= ======= ======= ======= =======
kSelMM 346.2 356.3 344.8 338.7 182.1 182.7 177.2 177.7
kGenMM 177.7 182.7 144.4 154.6 169.0 154.9 150.8 173.4
kMM_NT 138.6 134.5 144.4 145.4 118.4 114.9 131.5 134.4
kMM_TN 141.5 154.9 135.9 142.5 144.9 134.5 142.0 145.4
BIG_MM 328.9 334.8 320.4 320.7 169.9 171.4 174.3 175.6
kMV_N 54.2 54.7 144.0 143.7 46.5 47.3 90.1 92.5
kMV_T 59.2 63.5 74.3 75.4 41.9 42.8 54.1 53.9
kGER 52.3 52.9 110.3 110.8 26.7 26.7 54.3 53.7
*******************************************************************************
The first times are a P4E64SSE3, and second HAMMER64SSE3.
BTW, with the help of the gcc guys, I figured out why gcc could never vectorize
any code, and my initial timings show you can get some speedup now. For
AMD, the single precision speedup is significant. Intel gets only modest
speedup.
However, right now, ATLAS can't enable gcc's vectorization, because the gcc
guys have made -funsafe-math-optimizations a mandatory flag to enable it, and
this flag ruins IEEE compliance, which we can't do. I've got a bug report on
this at:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28684
Cheers,
Clint
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [gentoo-science] [Fwd: [atlas-devel] 3.7.14 out]
2006-08-18 4:13 [gentoo-science] [Fwd: [atlas-devel] 3.7.14 out] M. Edward (Ed) Borasky
@ 2006-08-18 12:33 ` Markus Dittrich
0 siblings, 0 replies; 2+ messages in thread
From: Markus Dittrich @ 2006-08-18 12:33 UTC (permalink / raw
To: gentoo-science
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
Thanks for the note! "Unfortunately", starting with 3.7.12
upstream has significantly changed atlas' build system, which
probably means that a major ebuild overhaul is needed. Hence,
it might take a little until this guy hits portage. Would you
mind filing a bug for it so we can track any progress there?
Thanks,
Markus
- --
Markus Dittrich (markusle)
Gentoo Linux Developer
Scientific applications
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
iD8DBQFE5bO0xlRwCwb7k40RAmwhAJ9AwncmSK8fROWzDW709Buo1ZtHdgCghwuq
EyDo9/pZCpbST/igrdUFoT4=
=0f2M
-----END PGP SIGNATURE-----
--
gentoo-science@gentoo.org mailing list
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2006-08-18 12:34 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-18 4:13 [gentoo-science] [Fwd: [atlas-devel] 3.7.14 out] M. Edward (Ed) Borasky
2006-08-18 12:33 ` Markus Dittrich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox