On Wednesday 08 September 2004 14:03, Corvus Corax wrote:
> Am Wed, 8 Sep 2004 13:29:01 +0200
>
> schrieb Paul de Vrieze <pauldv@gentoo.org>:
> > ...
> >
> > To do this for programs, one would need to have a realistic suite of
> > "tests" that simulate the real world use of the application. Of
> > course that also allows -fprofile_arcs to be used.
> >
> > Paul
>
> depending on the type of program - for easy command line converter
> tools, an easy "time" command would be sufficient (used that to
> determine potential (in code) optimizations for my motiontrack stuff)

Timing is not the issue, the issue is what you have the program do. 
Initializing and then dying is not really a truthfull representation of 
normal behaviour.

> however for libraries like QT or gtk which affect on screen performace
> of gui programs - both run-time and load - this will get much harder.

Certainly, for gui's you need some kind of scripting that simulates actual 
user actions. (fortunately the interactive behaviour is not the 
bottleneck in most interactive applications as most user actions lead to 
almost trivial computations)

> Maybe one could program a test-suite for each library that fires each
> function once and times them, along with a flag saved before startup to
> determine load time - but it would have to be done for every huge
> library.

This is handwork, and also the different functions need different weights 
based on a.o. the frequency of their work. In other words it is a very 
big lot of work, and completely specific to the library/application at 
hand.

>
> And finally the timing of interactive programs itself - well, usually
> most time goes while waiting for user input anyway, but there are
> timing critical tasks, too, imagine pattern searches or other big db
> operatins - or file load/save in openoffice, picture effects in gimp
> and such.
> those could maybe timed by doing them on a real huge data blog, where
> the single operation takes that long, that the user can measure it
> manually with a stopwatch.

This is not interesting for gentoo to offer. If the user wants to do 
manual timing he allways can. What would be interesting is automated 
timing. This unfortunately is not really easy with current packages. This 
is most easy for applications with test suites. With relatively small 
effort these test suites could double up as "representative applications 
use" for timing and arc profiling (a newer option of gcc)

> If the operation takes 40 seconds, and you can gain 3 seconds by
> optimisations, it is a blunt measurable improvement. However I dont
> like that idea, maybe one can time the operation by watching tmp files
> in background or something like this.
>
> Or the maintainer could go into the code and insert some debug lines to
> print timing information to stderr or such. But this would be way to
> much work for most maintainers and most software isnt it ?

Well, I agree that there is much that can be done to improve application 
performance. However, even with cflags (which in many cases do not make 
the huge difference), it are all application specific optimizations. 
These are things that application providers should do, not 
packagers/distributors. We don't have the knowledge or the time to try to 
find out for each specific cpu/application combination what the "fastest" 
cflags are.

Also the review at the website still has it's issues. While it is quite 
known that -O3 is in many cases slower than -O2, it is also true that the 
internal architecture of CPU's matters. Fact is that gcc has had support 
and testing on amd64 machines for a lot longer time than on xeons with 
64bit extensions. Scheduling on the two different cpus is likely to have 
different optimal strategies. It is unlikely that the xeon 64-bit 
scheduler is as optimized as the amd64 scheduler. The x86_64 compiler 
also defaults to generating amd64 optimal code (amd64 used to be the only 
processor), so it is not strange that this codes performs much faster on 
an opteron than on a xeon.

This leads to the observation that one can still base a buying decision on 
the benchmarks one can not actually say that the opteron IS faster than 
the xeon. Only that the opteron can execute opteron optimized code faster 
than a xeon can. It is also known that the pentium4 architecture (which 
the xeon has) is highly dependend on good scheduling so the observation 
is not really a surprise.

Paul

ps. This does not mean that the opteron is not faster than the xeon, just 
that this test does not give a reasonable indication of it.

-- 
Paul de Vrieze
Gentoo Developer
Mail: pauldv@gentoo.org
Homepage: http://www.devrieze.net