Global, shared memory, latency - GPU list


I am trying to make list of all Nvidia CUDA GPU-s with almost all specifications.
On Nvidia official site I have found graphic cards list ( but there is no global memory, shared memory and latency in specification tables.
Is there any other web site or document where I can find these specifications for almost every Nvidia graphic card ?

I would appreciate any help

For hardware specs, I like the Wikipedia page:

It doesn’t give the compute capabilities of each card directly, although one can deduce it from the “Code Name” column with some outside information about the capabilities of each chip.

In terms of other specifications, this table in the CUDA C Programming Guide is also useful:

Other aspects of the architecture, like memory latencies, are generally not officially reported and have to be deduced with microbenchmarks. I know of no comprehensive resource for such information.

Thank you 4 your answer

is there any way to check or to calculate global memory and shared memory size of a GPU without doing it from code?

Do you publish your list on the internet? Guess some more people would like to know all facts about all cards…

Yes. After I finish I will post excel file here on forum.
But i now need some help with global memory. Is there way to calculate it or I must own GPU and run deviceQuery ?

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “GeForce GT 640”
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 4095 MBytes (4294246400 bytes)
( 2) Multiprocessors x (192) CUDA Cores/MP: 384 CUDA Cores
GPU Clock rate: 902 MHz (0.90 GHz)
Memory Clock rate: 667 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 262144 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D= (4096,4096,4096)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 5 / 0
Compute Mode:
< Exclusive Process (many threads in one process is able to use ::cudaSetDevice() with this device)

Even deviceQuery doesn’t give global memory latency, you would need to write your own benchmark for that. For the published theoretical peak bandwidth values see the Wikipedia entry Seibert has pointed you to.

You don’t need to have a device to know how much global memory it has. Just check the specs. Size of the memory is one of the key selling points, e.g. when you see EVGA GeForce GTX 680 2048MB GDDR5 this means you have 2GB of global memory.

Shared memory sizes are listen in Table 10 Technical Specifications per Compute Capability of the CUDA C Programming Guide:

Which latency are you interested in? (And what would you do with it if you knew it?)

Are you sure ?
Because look at Device Query of “GeForce GT 640” that I posted and compare it with

4095 MBytes are not equal 2048 MB

There is a reason why the specs say “Standard Memory Config”: Vendors are free to ship cards with different memory configurations than Nvidia proposed. They are often called "“special edition” by the vendor and usually have more than the standard amount of memory to differentiate in a very uniform market.

That means you have GeForce GT 640 4GB, not GeForce GT 640 2GB. As tera pointed out, different configurations are possible. Either way, you don’t need to have the device to know the number - you can find it elsewhere. Such as on the packaging.

In device query of Ge force gt 640…

Maximum sizes of each dimension of a block: 1024 x 1024 x 64

does it mean that i can use a block of size 1024 x 1024 x 64 ???

can anybody answer it?


No. It only means that if you have dim3 threadsperblock, the last component can not be bigger than 64, but you still must have threadsperblock.xthreadsperblock.ythreadsperblock.z<=1024.

thanx for help!!!

In Device Query of Ge Force GT 640 …

Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535

does it mean i can launch a grid with size 2147483647 x 65535 x 65535 …

can anyone answer