From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 3709 invoked by uid 1002); 17 May 2003 23:28:47 -0000 Mailing-List: contact gentoo-dev-help@gentoo.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@gentoo.org Received: (qmail 31674 invoked from network); 17 May 2003 23:28:47 -0000 Content-Type: text/plain; charset="iso-8859-1" From: Evan Powers To: gentoo-dev@gentoo.org Date: Sat, 17 May 2003 19:28:41 -0400 User-Agent: KMail/1.4.3 References: <20030517024911.GA6515@vaughan.foofalicious.com> <87y91675tu.fsf@speakeasy.net> In-Reply-To: <87y91675tu.fsf@speakeasy.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Message-Id: <200305171928.41732.powers.161@osu.edu> Subject: Re: [gentoo-dev] -fbranch-probabilities optimisation X-Archives-Salt: 3f60bfac-6e17-4d1a-80ac-292b20541469 X-Archives-Hash: db9dfadfc347fc6baf13fb87988c378a On Friday 16 May 2003 08:52 pm, Anupam Kapoor wrote: > i was under the assumption that most processors already perform branch > prediction. no ? do you think that -fbranch-probabilities provides a > more 'comprehensive' view of the executing program ? Yes, modern processors do perform branch prediction. But you're right, th= e=20 compiler can do a much more 'comprehensive' job. The reason is that CPU branch prediction algorithms tend to forget any=20 information they learn that occurred farther away in time or space than=20 certain thresholds. The Xeon, I think, retains information only about the= =20 last 16 branching instructions and the direction of only the last four=20 branches on each of those instructions. GCC, on the other hand, can retain information about all the branches, an= d=20 complete directional statistics over the entire run, but it can only make= =20 static predictions. (For example, GCC can't take advantage of knowing tha= t=20 the branch will go one way for 200 consecutive times, then the other for=20 300--it has to predict the branch will go the second way all the time and= =20 order accordingly.) So you get the best performance when you combine the two approaches. Evan -- gentoo-dev@gentoo.org mailing list