Linus wisdom on microarchtiectures
"I personally have a very simple rule: if performance changes this drastically from compiler tweaking, the uarch is broken.
P4 was broken. And BD is broken."
"I think POWER6 was indeed fairly broken, but it damn well made up for it with high frequencies and ultimately good performance. So it did the whole P4 thing, but without quite as much fragility.
But yes, I definitely think POWER7 is the better uarch. POWER5 I thought was quite reasonable, although iirc it still had that odd 2-cycle integer latency from POWER4.
...
But if it's a matter of "pretty ho-hum performance, but then sometimes if things line up just right you look quite good", that's a bad, bad, bad thing. I'd much rather take something reliably fast, than something that I have to waste my time tweaking for.
Because then my binary may run really well on one uarch, but two years down the line I will have wasted all my time and need to tweak all over again for the next fragile piece of crap."
"Intel decoders are still less symmetric than the AMD ones, but since the uops have become more powerful, that is much less noticeable. And Intel uarchs appear to be much less fragile when it comes to pretty much everything else, particularly the memory pipeline.
...
I do agree that compilers obviously tend to prefer the most common setup by developers, and Intel does get preferential treatment for that reason. But the compiler rules for modern Intel CPU's aren't crazy."
0 Comments:
Post a Comment
<< Home