Using cudaOccupancyMaxActiveBlocksPerMultiprocessor with function acquired with cuModuleGetFunction

dmcmobile5 · July 20, 2021, 2:12pm

I am trying to use the cudaOccupancyMaxActiveBlocksPerMultiprocessor() with a function/kernel pointer i got using the cuModuleGetFunction(), but am getting INVALID device function.

I have reproduced the same error/behaviour using the vectorAddDrv.

I added the following line (plus an include to cuda_runtime_api.h):

assert(cudaOccupancyMaxActiveBlocksPerMultiprocessor ( &blocks, vecAdd_kernel, 256, 0 ) == cudaSuccess);

Right after the existing cuModuleGetFunction call (line 114-115 in my sources):

checkCudaErrors(cuModuleGetFunction(&vecAdd_kernel, cuModule, "VecAdd_kernel"));

Running with cuda-gdb i get the following line:

warning: Cuda API error detected: cudaOccupancyMaxActiveBlocksPerMultiprocessor returned (0x62)

The code corresponds to cudaErrorInvalidDeviceFunction, which is strange as the kernel/function is valid and runnable.

As the documentation does not mention any corner cases, i assumed i could use cudaOccupancyMaxActiveBlocksPerMultiprocessor with a function/kernel pointer returned from cuModuleGetFunction, is this not the case?

(I am using CUDA 10.1)

Thanks in advance

Robert_Crovella · July 20, 2021, 2:35pm

why not use the driver API function for this? vectorAddDrv is a driver API code, runtime API functions start with cuda… driver API functions start with cu… (but not cuda, of course)

https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__OCCUPANCY.html#group__CUDA__OCCUPANCY

Topic		Replies	Views
cudaErrorInvalidDeviceFunction : what's the cause ? CUDA Programming and Performance	0	1609	January 19, 2012
cudaErrorInvalidDeviceFunction: kernel fails to load, but stuck there CUDA Programming and Performance	6	77	July 19, 2024
cudaGraphAddKernelNode() fails cuModuleGetFunction() function CUDA Programming and Performance	1	221	March 3, 2025
Where is cuOccupancyMaxActiveBlocksPerMultiprocessor defined? CUDA Programming and Performance	3	1912	February 3, 2015
cudaOccupancyMaxPotentialBlockSize - invalidDeviceFunction Error in CUDA 10 CUDA Programming and Performance	8	2446	January 2, 2019
Invalid device function CUDA Programming and Performance	10	6456	November 19, 2008
How to use cudaOccupancyMaxActiveBlocksPerMultiprocessor with template kernel? CUDA Programming and Performance	2	907	September 22, 2018
cudaMallocManaged() not working CUDA Programming and Performance	1	2358	November 18, 2018
invalid device function, all CUDA-capable devices are busy or unavailable CUDA Programming and Performance	5	7757	July 6, 2013
How to use cudaFuncGetName api? CUDA Programming and Performance	9	546	December 25, 2023

Using cudaOccupancyMaxActiveBlocksPerMultiprocessor with function acquired with cuModuleGetFunction

Related topics