I do not want to start a debate about this issue. I am just interested about architecture issues in Fermi and ATI. Are there any reports benchmarking their GPGPU capability? Based on the architecture of the chips. Specifically running benchmarks targeting specific architectural capabilities on each chip.
Both NVIDIA and AMD have two current architectures, and both are rather different internally: GF100 32 core MP and GF104 48 core MP for NVIDIA, VLIW4 and VLIW5 for AMD. Which ones are you interested in?
Basically all of them. I am interested also about cache issues, concurrency in kernel execution, out of order execution, warping, coalescing vs random access, etc. Which of the cards perform better. Kirk and Hwu for example has done some excellent work in explaining the basics but they missed GF100 and above which is natural since Fermi wasn’t out yet. Are there any case studies? Of course I am not choosing whether I should buy a card. I have already made my choice on a GTX 480. The research I am doing in these aspects is on optimizing the code I want to create and to go deeply into my records why I am selecting to code in a specific way on my GTX 480. So it is basically GF100, this is my base. The secret behind this kind of reports is that they give you ideas on highly optimizing your code on the specific architecture you choose.
Are you taking into account the new bit manipulation instructions in Fermi? These probably aren’t exposed very well in CUDA, but they seem like they would provide decent benefits for applications that do these regularly.