I need to write a device function assuming some shared memory is available from a kernel calling the device function. I did some very quick test using the following chunk of code with cub (please ignore the functionality of my code and just focus on the syntax):
__device__ void block_reduce_test(const int* data, const int num)
{
extern __shared__ typename cub::BlockReduce<int, REDUCTION_BLOCK_SIZE>::TempStorage temp_storage_1;
int thread_data[2];
int sum = cub::BlockReduce<int, REDUCTION_BLOCK_SIZE>(temp_storage_1).Sum(thread_data);
(void)sum;
}
__global__ void test_knl(const int* data, const int num)
{
__shared__ typename cub::BlockReduce<int, REDUCTION_BLOCK_SIZE>::TempStorage temp_storage_1;
block_reduce_test(data, num);
}
I got the following error:
variable "temp_storage_1" was declared but never referenced
If I am not mistaken, shared memory is compiled into some static symbol visible to the whole translation unit so extern should be needed to see it. I must have made some mistake here either in my understanding of shared memory compilation or in cub shared memory declaration. How to assume there is some shared memory and use that memory in device memory when I am developing a device function independently then? In particular, I want the device function to be developed with cub.