How to use cudaOccupancyMaxActiveBlocksPerMultiprocessor with template kernel?

I want to use cudaOccupancyMaxActiveBlocksPerMultiprocessor with a kernel function template.

However I have difficulties to properly pass the kernel as argument to this function.

template<int blocksize>
void kernel(...){

void launchkernel(...){

	#define getBlocksPerSM(blocksize) {\
		                        cudaOccupancyMaxActiveBlocksPerMultiprocessor(&max_blocks_per_SM, \
		                                                                        kernel<(blocksize)>, \
		                                                                        blocksize, smem); }

	const std::size_t smem = calculateSmem();

	int blocksize = calculateBlocksize();

	int max_blocks_per_SM = 1;

	case 32: getBlocksPerSM(32); break;
	case 64: getBlocksPerSM(64); break;
	case 128: getBlocksPerSM(128); break;
	case 256: getBlocksPerSM(256); break;
	default: throw std::runtime_error("Illegal blocksize");


This gives the error: no instance of overloaded function “cudaOccupancyMaxActiveBlocksPerMultiprocessor” matches the argument list
argument types are: (int *, , int, const std::size_t)

Could somebody tell me how to do it correctly?

I had to pass a template function as argument to another function (don’t quite remember why, not at home ATM), and it looked liked this:

function_that_wants_template(template_function <> (), OTHER_ARGS);

Yes, no argument inside <> and (), because the function didn’t require the template to be instantiated.
Can you try calling just cudaOccupancyMaxActiveBlocksPerMultiprocessor, without the macro, and passing the rest of the arguments of your kernel function (if any), or the empty () if none?

cudaOccupancyMaxActiveBlocksPerMultiprocessor(&max_blocks_per_SM, kernel <256> (), 256, smem);

And see what else it gives you?

Hi, thank you for your response.

In the end, it was a simple mistake on my side. My kernel has multiple template arguments, some of which are infered from the parameter list. Since there is no parameter list for occupancy calculation, I had to specify the infered template types directly.