Hi.
I want to know what is the default configured shared memory size per block in GeForce GTX 480…
Also, how can i verify this by code (e.g. by making a function call)?
Thanks :)
Hi.
I want to know what is the default configured shared memory size per block in GeForce GTX 480…
Also, how can i verify this by code (e.g. by making a function call)?
Thanks :)
Hi,
By default, GTX480 along with all compute capability 2.x devices have 48KB of shared memory per SM and 16KB of L1 cache.
You can check this with cudaGetDeviceProperties() and checking cudaDeviceProp::sharedMemPerBlock.
Are you sure that the devices have 48KB per SM by default? Or just 48KB is the maximum size? I have read somewhere in that forum that the devices uses only 16KB of Shared Memory and if i want to switch to 48KB i need to call cudaFuncSetCacheConfig() with cudaFuncCachePreferShared argument… Is this true?
Is this somewhere stated formally by Nvidia?
How can i use cudaDeviceProp::sharedMemPerBlock to read the current values / sizes?
If you compile for sm_2x, the default is 48Kb of shared memory, 16Kb of L1 cache. You can change the setting with cudaFuncSetCacheConfig.
See page 145 of the CUDA C Programming Guide.
Great! Thank you mfatica.
One more thing. I compile my programs with nvcc without using any sm_xx architecture statement. What is the default architecture for my case? Does the defaults above apply to my case?
Also, Gilles_C, please give me a code example of how to use cudaDeviceProp::sharedMemPerBlock to read the current values / sizes if you can (because i programming in C)…
Default compute capability is 1.0 regardless of the machine you are compiling on.
Here you are:
int dev = 0; // to be adjusted or queried according to your needs
cudaDeviceProp props;
cudaGetDeviceProperties(&props, dev);
printf("Amount of shared memory per block is %d bytes\n", props.sharedMemPerBlock);
That’s ok! Thank you very much, all!