Intel's Pentium 4 E: Prescott Arrives with Luggage
by Anand Lal Shimpi & Derek Wilson on February 1, 2004 3:06 PM EST- Posted in
- CPUs
Prescott's New Crystal Ball: Branch Predictor Improvements
We’ve said it before: before you can build a longer pipeline or add more execution units, you need a powerful branch predictor. The branch predictor (more specifically, its accuracy), will determine how many operations you can have working their way through the CPU until you hit a stall. Intel extended the basic Integer pipeline by 11 stages, so they need to make corresponding increases in the accuracy of Prescott’s branch predictor otherwise performance will inevitably tank.
Intel admits that the majority of the branch predictor unit remains unchanged in Prescott, but there have been some key modifications to help balance performance.
For those of you that aren’t familiar with the term, the role of a branch predictor in a processor is to predict the path code will take. If you’ve ever written code before, it boils down to being able to predict which part of a conditional statement (if-then, loops, etc…) will be taken. Present day branch predictors work on a simple principle; if branches were taken in the past, it is likely that they will be taken in the future. So the purpose of a branch predictor is to keep track of the code being executed on the CPU, and increment counters that keep track of how often branches at particular addresses were taken. Once enough data has accumulated in these counters, the branch predictor will then be able to predict branches as taken or not taken with relatively high accuracy, assuming they are given enough room to store all of this data.
One way of improving the accuracy of a branch predictor, as you may guess, is to give the unit more space to keep track of previously taken (or not taken) branches. AMD improved the accuracy of their branch predictor in the Opteron by increasing the amount of space available to store branch data, Intel has not chosen to do so with Prescott. Prescott’s Branch Target Buffer remains unchanged at 4K entries and it doesn’t look like Intel has increased the size of the Global History Counter either. Instead, Intel focused on tuning the efficiency of their branch predictor using less die-space-consuming methods.
Loops are very common in code, they are useful for zeroing data structures, printing characters or are simply a part of a larger algorithm. Although you may not think of them as branches, loops are inherently filled with branches – before you start a loop and every iteration of the loop, you must find out whether you should continue executing the loop. Luckily, these types of branches are relatively easy to predict; you could generally assume that if the outcome of a branch took you to an earlier point in the code (called a backwards branch), that you were dealing with a loop and the branch predictor should predict taken.
As you would expect, not all backwards branches should be taken – not all of them are at the end of a loop. Backwards branches that aren’t loop ending branches are sometimes the result of error handling in code, if an error is generated then you should back up and start over again. But if there’s no error generated in the application, then the prediction should be not-taken, but how do you specify this while keeping hardware simple?
Code Fragment A Line 10: while (i < 10) do |
Code Fragment B Line 10: A; |
---|---|
Line 14 is a backwards branch at the end of a loop - should be taken! | Line 80 is a backwards branch not at the end of a loop - should not be taken! |
It turns out that loop ending branches and these error branches, both backwards branches, differentiate themselves from one another by the amount of code that separates the branch from its target. Loops are generally small, and thus only a handful of instructions will separate the branch from its target; error handling branches generally instruct the CPU to go back many more lines of code. The depiction below should illustrate this a bit better:
Prescott includes a new algorithm that looks at how far the branch target is from the actual branch instruction, and better determines whether or not to take the branch. These enhancements are for static branch prediction, which looks at certain scenarios and always makes the same prediction when those scenarios occur. Prescott also includes improvements to its dynamic branch prediction.
104 Comments
View All Comments
Stlr22 - Sunday, February 1, 2004 - link
post*Stlr22 - Sunday, February 1, 2004 - link
KristopherKubickiEarlier you said that I should read the article.
What was your point? What was it about my first pot that you disagreed with?
KristopherKubicki - Sunday, February 1, 2004 - link
#7:I agree 100% with Anand and Derek. This processor will be a non-event until we get in the 3.6GHz range. Similar to Northwood's launch.
#10:
Check out our price engine. We have already been listing the processor a week!
http://www.anandtech.com/guides/priceguide.htm
http://www.monarchcomputer.com/Merchant2/merchant....
cliffa3 - Sunday, February 1, 2004 - link
In the table on page 14 it shows that the 90nm P4@2.8 will have a 533 MHz FSB, but is that the case? I did some quick google research and can't find anything to support that...please confirm or correct, thanks.NFactor - Sunday, February 1, 2004 - link
Yes, I must agree this is an amazing article, one of the best i have ever read. Thanks.Xentropy - Sunday, February 1, 2004 - link
VERY interesting article. Thank you Anand and Derek! One of the best I've read on Anandtech, and I consider yours the best hardware site on the net!One correction, on page 7, you say, "if you want to multiply a number in binary by 2 you can simply shift the bits of the number to the right by 1 bit," but don't you mean shift to the left one bit (and place a zero at the end)? It's much like multiplying a decimal number by ten for obvious reasons.
Anyway, it looks like the Prescott is somewhat of a non-event at this time. Just new cores that perform fundamentally the same as the current ones at current speeds. The real news will come later; Intel has just positioned itself for one hell of a speed ramp to come. Northwood was clearly at the end of the line. One analogy, I suppose, would be that Intel didn't fire any shots in the CPU war today, but they loaded their guns in preparation to fire.
The coming year will be an exciting one for us hardware geeks. I'm interested in seeing how higher clocked Prescotts play out as well as whether anything 64-bit shows up before 2005 to support AMD's stance that we need it NOW.
Again, thanks for a very thorough article!
Stlr22 - Sunday, February 1, 2004 - link
KristopherKubickiSo what's your take on these new Prescotts?
KristopherKubicki - Sunday, February 1, 2004 - link
Anand scolded me for not reading the article :( I only read the conclusion and the graphs. Turns out the decision making isnt as clearcut as it sounds.As for the thing with the inquirer. Well, lots of people had prescotts. We had one back in August I believe. The thing is they were horribly slow - 533FSB 2.8GHz. Everyone drew the conclusion that these were purposely slowed processors that were jsut for engineering purposes. While the inq benched this processor, most people didnt just becuase they were under the impression this was not to be the final production model. Hope that clears up some discrepancy about the validity.
Cheers,
Kristopher
wicktron - Sunday, February 1, 2004 - link
Hehe, I guess the Inq was right about this one. Where are all the Inq bashers and their claim of "fake" benchies? Haha, I laugh.Stlr22 - Sunday, February 1, 2004 - link
KristopherKubicki - "read the article..."lol that might be a good idea, as I only broswed it and read the conclusion. :D