memory size how can i know the size of the different memories?

I am newbie with CUDA and I am looking for information about the size of the
global memory, texture memory, texture cache, constant memory, constant cache,
local memory and shared memory of my GeForce 8800 GTX?

How can i know these size?

I have collected in other posts the following data:
texture cache: 8KB
constant cache: 8KB
constant memory: 64KB
shared memory per multiprocessor 16KB
Device memory 768 MB
global memory: ??
texture memory: ??
local memory: ??

Thanks in advance.

you can use deviceQuery in SDK, for example, my Tesla C1060 shows

[codebox]Device 2: “Tesla C1060”

CUDA Driver Version: 2.30

CUDA Runtime Version: 2.30

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 3

Total amount of global memory: 4294705152 bytes

Number of multiprocessors: 30

Number of cores: 240

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.30 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: No

Integrated: No

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads

can use this device simultaneously)[/codebox]

then you can read programming guide for detailed description.

local memory is global memory physically, “local” means that scope is restricted inside a kernel function.

Thank you LSChien :rolleyes: .

As far as I understand, texture fetches are cached. I’m thinking about utilizing this feature in my code. Unfortunately, I cannot find the size of the cache in the device spec. The output of deviceQuery, kindly provided in this thread does NOT list this number. Neither does the output of the same program run on my box (sdk 2.2).

Does anybody know the size of texture cache in GT 200 generation of devices?

I think it would be useful to incorporate this data point into deviceQuery.


Is there any further information in this thread?

GeForce 8800 GTX has Compute Capability 1.0. You can find this information and its consequences in Apprendix A of the Programming Guide. This tells you that:

The number of registers per multiprocessor: 8192 (you haven’t asked for this but this could be useful as well)
The amount of shared memory available per multiprocessor: 16 KB
The cache working set for constant memory is 8 KB per multiprocessor
The cache working set for texture memory varies between 6 and 8 KB per multiprocessor

Since GeForce 8800 GTX has 16 multiprocessors, you might want to multiple the above values by 16 if you want total amount of those values, but I doubt that would be of any use for you since data from those memory components cannot be shared between multiprocessors.

The total amount of constant memory is 64 KB
The total amount of local memory per thread is 16 KB
For a one-dimensional texture reference bound to a CUDA array, the maximum width is 2^13 (that is - an array of 2^13 elements if I understand this correctly)
For a one-dimensional texture reference bound to linear memory, the maximum width is 2^27
For a two-dimensional texture reference bound to linear memory or a CUDA array, the maximum width is 2^16 and the maximum height is 2^15
For a three-dimensional texture reference bound to a CUDA array, the maximum width is 2^11, the maximum height is 2^11, and the maximum depth is 2^11;

Names “device memory” and “global memory” represent the exactly the same memory :)
“device memory” is usually used when dealing with host code to distinguish global GPU memory from host’s RAM memory.
“global memory” is usually used when dealing with device (GPU) code, kernels to distingush it from shared, constant, local etc types of GPU memory.

There is no special “texture memory”. When we say texture memory, we mean a portion of global memory binded to a texture fetcher. It is used to access that global memory in a better way and use on-chip cache to hide some latencies. Therefore texture memory is limited by the amount of global memory and by the additonal restrictions on the texture references which I described above.

Amout of local memory - 16KB per thread, you can have 768x16 active threads, that makes 196MB of local memory in total. You should know however that local memory physically lies in global memory and consume its space.

To sum it up: all your global memory, texture memory and local memory must fit in 768MB your graphics card provides.

@yangxin: Your question is also answered in specification of CC 1.0: texture cache is between 6-8KB per multiprocessor. This hasn’t chanced (so far) with newer generation of cards.

Thank you, Cygnus X1. I missed these data.