Perhaps this is a very old and basic topic but since the forum search was highly inefficient to use, I was compelled to post my question like this. I am a beginner in using GPU and CUDA and have started my hands on a cheaper one, GeForce 9800 GTX+. I wanted to know the following for the GPU model that I am using:
- No. of stream processors (SM) = 128 (this was in the listed specifications as the number of processor cores).
- Amount of shared memory per SM ? (The specifications mention Standard Memory Config as 512MB. I understand that the amount of shared memory should be some (512-x)/128 MB per SM. This “x” should account for the other memories like global, texture, constant and registers. What will be the shared memory per SM?)
- Number of thread blocks per SM?
- Number of threads in a block?
- Number of threads in a warp?
- Number of (32bit?) registers per SM?
- Peak performance in GFLOPS?
I went through a lot of literature on GPU and came across a lot of figures for the above questions but no-where did I find a definitive answer for the model of GPU that I am using. So far I have only run an embarrassingly parallel matrix multiplication application on this GPU. I tried playing around with this code by making some changes for shared memory parameters but I wouldn’t vouch for my experiments.
How can I know the above figures for the GPU (GeForce 9800 GTX+ or any for that matter) that I am employing? Is there a standard way or some nvidia document (which I failed to search even after all the efforts)? Will be very thankful if someone could please throw some light.
Thanks & regards,