My configuration is: Win7 64bit + VS2008 SP1 + 2 Tesla C2050 + CUDA 4.0 + Nsight 2.0
I want to use the following kernel to convert a float array from log-version to exp-version.
global void log2exp(float *d_Le)
const unsigned int xIndex = blockDim.x * blockIdx.x + threadIdx.x;
d_Le[xIndex] = 2.0/(1.0+__expf(d_Le[xIndex]))-1.0;
It runs very slow. So I use Nsight to find that the data are stored in local memory.
Can anyone tell me why the data are in local memory? And how can I avoid to put them in the local memory?
Thanks a lot.