On Wednesday 08 September 2004 14:03, Corvus Corax wrote: > Am Wed, 8 Sep 2004 13:29:01 +0200 > > schrieb Paul de Vrieze : > > ... > > > > To do this for programs, one would need to have a realistic suite of > > "tests" that simulate the real world use of the application. Of > > course that also allows -fprofile_arcs to be used. > > > > Paul > > depending on the type of program - for easy command line converter > tools, an easy "time" command would be sufficient (used that to > determine potential (in code) optimizations for my motiontrack stuff) Timing is not the issue, the issue is what you have the program do. Initializing and then dying is not really a truthfull representation of normal behaviour. > however for libraries like QT or gtk which affect on screen performace > of gui programs - both run-time and load - this will get much harder. Certainly, for gui's you need some kind of scripting that simulates actual user actions. (fortunately the interactive behaviour is not the bottleneck in most interactive applications as most user actions lead to almost trivial computations) > Maybe one could program a test-suite for each library that fires each > function once and times them, along with a flag saved before startup to > determine load time - but it would have to be done for every huge > library. This is handwork, and also the different functions need different weights based on a.o. the frequency of their work. In other words it is a very big lot of work, and completely specific to the library/application at hand. > > And finally the timing of interactive programs itself - well, usually > most time goes while waiting for user input anyway, but there are > timing critical tasks, too, imagine pattern searches or other big db > operatins - or file load/save in openoffice, picture effects in gimp > and such. > those could maybe timed by doing them on a real huge data blog, where > the single operation takes that long, that the user can measure it > manually with a stopwatch. This is not interesting for gentoo to offer. If the user wants to do manual timing he allways can. What would be interesting is automated timing. This unfortunately is not really easy with current packages. This is most easy for applications with test suites. With relatively small effort these test suites could double up as "representative applications use" for timing and arc profiling (a newer option of gcc) > If the operation takes 40 seconds, and you can gain 3 seconds by > optimisations, it is a blunt measurable improvement. However I dont > like that idea, maybe one can time the operation by watching tmp files > in background or something like this. > > Or the maintainer could go into the code and insert some debug lines to > print timing information to stderr or such. But this would be way to > much work for most maintainers and most software isnt it ? Well, I agree that there is much that can be done to improve application performance. However, even with cflags (which in many cases do not make the huge difference), it are all application specific optimizations. These are things that application providers should do, not packagers/distributors. We don't have the knowledge or the time to try to find out for each specific cpu/application combination what the "fastest" cflags are. Also the review at the website still has it's issues. While it is quite known that -O3 is in many cases slower than -O2, it is also true that the internal architecture of CPU's matters. Fact is that gcc has had support and testing on amd64 machines for a lot longer time than on xeons with 64bit extensions. Scheduling on the two different cpus is likely to have different optimal strategies. It is unlikely that the xeon 64-bit scheduler is as optimized as the amd64 scheduler. The x86_64 compiler also defaults to generating amd64 optimal code (amd64 used to be the only processor), so it is not strange that this codes performs much faster on an opteron than on a xeon. This leads to the observation that one can still base a buying decision on the benchmarks one can not actually say that the opteron IS faster than the xeon. Only that the opteron can execute opteron optimized code faster than a xeon can. It is also known that the pentium4 architecture (which the xeon has) is highly dependend on good scheduling so the observation is not really a surprise. Paul ps. This does not mean that the opteron is not faster than the xeon, just that this test does not give a reasonable indication of it. -- Paul de Vrieze Gentoo Developer Mail: pauldv@gentoo.org Homepage: http://www.devrieze.net