ekon
1
hello,
I’m using CUDA 2.0. I have written a kernel which uses shared memory. All shared memory variables are declared as follows:
[codebox]
__shared__ float errork[4], tmpData[1000];[/codebox]
However the nvcc compiler reports the following where it seems like the tmpData variable has been allocated as double in size:
[codebox]
1>ptxas info : Compiling entry function ‘_globfunc__Z5kTestjPfS_jjfS’
1>ptxas info : Used 6 registers, 4060+4044 bytes smem, 16 bytes cmem[1]
[/codebox]
If I change the declaration to :
[codebox]
__shared__ float errork[2], tmpData[1000];
[/codebox]
Then the nvcc output magically seems fine:
[codebox]
1>ptxas info : Compiling entry function ‘_globfunc__Z5kTestjPfS_jjfS’
1>ptxas info : Used 6 registers, 4052+52 bytes smem, 16 bytes cmem[1]
[/codebox]
Hi,
I’ve asked this question in the past, but couldnt get an “official” answer :)
However in both cases the amount of shared memory used is ~4060 (in your case) the second
number (i.e. after the + sign) is something internal or subset of the number left of the + sign.
You can ignore it.
eyal