Questions about compute capability 2.0 GPU

ahmetyilmaz.1975 · July 20, 2010, 6:06pm

Hi,

I have questions about compute capability 2.0 as dont have chance to experiment cases.

I think most important bottleneck is register usage in Cuda. In what conditions, local memory is used instead of registers, can you give examples maybe like “-maxregcount”? What is the performance effect of register spilling for compute capability 2.0?
What is the performance loss for strided access to global memory and noncoalesced accesses on compute capability 2.0 considering global cache? For example on 1.x, textured fetch can be used to avoid noncoalesced accesses to global memory as texture memory have cache.
What is the size per multiprocessor for global and local cache onchip for compute capability 2.0? What is total size of onchip memory? Can you give me size distributions? For example, register file size is 48 Kb.
Branch prediction avoids divergence and serialization of instructions. Can you give examples and cases about branch predication? In documents #pragma unroll is not explained in detail.

My other questions are about hardware:

What is the difference between series like GeForce, Tesla and Quadro? Although GeForce series are for game playing, GeForce GTX 480 looks higher performance than Tesla C2050.
There are graphic cards that has two GPUs like Geforce GTX 295. What is the performance gain (2x ?) of using these cards over that has one GPU like Geforce GTX 280? Are these cards programmed using multi GPU principles? Is it possible for GPU - GPU communications without host node as GPUs have same off chip memory?

Thanks

seibert · July 21, 2010, 4:34pm

I can answer a few of these:

This is mentioned in the Fermi whitepaper, I believe. Each multiprocessor has 64 kB of memory that can be split 16 kB shared memory / 48 kB L1 cache, or vice versa. The L2 cache is for the entire GPU and has a size of 768 kB. (The exception is the GTX 460, where the L2 cache is cut down in proportion to the reduced number of memory channels. Presumably, this will be documented in the next CUDA release.)

This is extensively discussed here:

http://forums.nvidia.com/index.php?showtopic=165055

Short version: GeForce has no ECC, less device memory, and the double precision is 1/8 the single precision rate, whereas for Tesla the factor is only 1/2.

The performance gain can be linear, but depends on your problem. The two GPUs appear as two distinct CUDA devices each with separate device memory. NVIDIA implements these dual GPU devices with a PCI-Express switch on the board, so transfers between system memory and device memory have to share the same channel, which can slow things down if you need to perform large simultaneous transfers to both GPUs. There is currently no GPU-GPU direct communication mechanism through the PCI-Express bus, and since the device memories are distinct, there can be no communication through that route either.

ahmetyilmaz.1975 · July 22, 2010, 2:49pm

Thank you, it is best choice to buy GTX 480 for personel use of Cuda.

Topic		Replies	Views
Should I buy Tesla or GTX295 CUDA Programming and Performance	5	9930	January 19, 2010
Tesla vs GeForce archs What makes the tesla better? CUDA Programming and Performance	8	18319	September 14, 2009
newbie questions CUDA Programming and Performance	14	1864	September 24, 2010
Speed problems with multi-gpu on GTX295 CUDA Programming and Performance	6	3080	January 5, 2010
Compute Capability 2.0 Texture Size Limits CUDA Programming and Performance	6	2053	July 1, 2010
GTX 480 vs Tesla C2050/C2070 for CUDA GPU Appl Dev Advise needed regarding GPU platform CUDA Programming and Performance	5	2164	October 25, 2010
Geforce VS Quadro for Molecular Dinamics in CUDA Which is better for molecular dinamics? CUDA Programming and Performance	7	16561	May 2, 2010
Need help to choose either the gtx 295 or the gtx 480 for massive Lattice Boltzman simulations CUDA Programming and Performance	10	1295	December 9, 2010
Number of Register vs different architecture CUDA Programming and Performance	11	4594	June 15, 2011
Tesla C2050 slower than GeForce 8800? CUDA Programming and Performance	14	20904	April 20, 2011

Questions about compute capability 2.0 GPU

Related topics