Understanding the (unspecified) GPU specifications

Hi all,

Perhaps this is a very old and basic topic but since the forum search was highly inefficient to use, I was compelled to post my question like this. I am a beginner in using GPU and CUDA and have started my hands on a cheaper one, GeForce 9800 GTX+. I wanted to know the following for the GPU model that I am using:

  1. No. of stream processors (SM) = 128 (this was in the listed specifications as the number of processor cores).
  2. Amount of shared memory per SM ? (The specifications mention Standard Memory Config as 512MB. I understand that the amount of shared memory should be some (512-x)/128 MB per SM. This “x” should account for the other memories like global, texture, constant and registers. What will be the shared memory per SM?)
  3. Number of thread blocks per SM?
  4. Number of threads in a block?
  5. Number of threads in a warp?
  6. Number of (32bit?) registers per SM?
  7. Peak performance in GFLOPS?

I went through a lot of literature on GPU and came across a lot of figures for the above questions but no-where did I find a definitive answer for the model of GPU that I am using. So far I have only run an embarrassingly parallel matrix multiplication application on this GPU. I tried playing around with this code by making some changes for shared memory parameters but I wouldn’t vouch for my experiments.

How can I know the above figures for the GPU (GeForce 9800 GTX+ or any for that matter) that I am employing? Is there a standard way or some nvidia document (which I failed to search even after all the efforts)? Will be very thankful if someone could please throw some light.

Thanks & regards,


All of this is in the programming guide. Appendix A, I think?

Yeah, I read that (programming guide ver 2.0) but a couple of things are confusing me in there:

  1. 9800 GTX+ does not even appear in the list of CUDA enabled GPUs (A.1) while it is very much CUDA enabled. And how do I otherwise know the compute capability for this GPU model…its not mentioned anywhere?

  2. But anyways 9800 GTX is not very different so may be I can assume that even GTX+ is the same compute capability. If I go by that, the table mentions number of multiprocessors for it as 16 while I think it should have been 128.

What should I do for these values?

Thanks & regards,


Run the deviceQuery sample in the SDK. And no, it has 128 shader processors and 16 shader multiprocessors, where each SM is 8 SPs.

thanks…this was very useful !!