Declare __shared__ in kernel and Use it in Device Function with CUB

I need to write a device function assuming some shared memory is available from a kernel calling the device function. I did some very quick test using the following chunk of code with cub (please ignore the functionality of my code and just focus on the syntax):

__device__ void block_reduce_test(const int* data, const int num)
{
  extern __shared__ typename cub::BlockReduce<int, REDUCTION_BLOCK_SIZE>::TempStorage temp_storage_1;

  int thread_data[2];
  int sum = cub::BlockReduce<int, REDUCTION_BLOCK_SIZE>(temp_storage_1).Sum(thread_data);
  (void)sum;
}

__global__ void test_knl(const int* data, const int num)
{
  __shared__ typename cub::BlockReduce<int, REDUCTION_BLOCK_SIZE>::TempStorage temp_storage_1;
  block_reduce_test(data, num);
}

I got the following error:

variable "temp_storage_1" was declared but never referenced

If I am not mistaken, shared memory is compiled into some static symbol visible to the whole translation unit so extern should be needed to see it. I must have made some mistake here either in my understanding of shared memory compilation or in cub shared memory declaration. How to assume there is some shared memory and use that memory in device memory when I am developing a device function independently then? In particular, I want the device function to be developed with cub.

There are numerous cub codes in many forum questions that show how to do a shared allocation for cub usage.

It would typically be done as a static shared allocation, not a dynamic (extern) one.

It’s not really clear to me what you want to do. If you want to do a block reduce in cub, your definition here:

is sufficient.

If you want to pass that to a __device__ function, do it explicitly via function parameter.

1 Like

Quick question. In the following link, there is the following statement: the block/example_block_reduce_dyn_smem.cu example illustrates usage of dynamically shared memory with BlockReduce and how to re-purpose the same memory region. Where is the file block/example_block_reduce_dyn_smem.cu?

https://nvidia.github.io/cccl/cub/api/classcub_1_1BlockReduce.html#re-using-dynamically-allocating-shared-memory