Guys, I need to allocate an array in shared memory, but the dimension is not known at compile time.
I tried to create it dynamically but nvcc has complained about it…
The method is covered in the documentation:
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#shared
And there are many CUDA sample codes that have examples.
You’ll want to follow this form:
extern __shared__ float shared[];
And make sure to allocate enough storage for it (in bytes) in the execution configuration (i.e. kernel launch configuration <<<…>>>) There is a link in the doc at the above location to execution configuration. The parameter you pass in the execution configuration (the size of the dynamically allocated shared array in bytes) can be a runtime variable. It does not need to be known at compile-time.
Thanks txbob :)