CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE with Numba

user136136 · January 16, 2022, 8:24pm

I’m getting the error: “numba.cuda.cudadrv.driver.CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE” when trying to run my code.

I’m new to working with CUDA, but as far as I can see the code is within the limitations for the threads per block (less than 1024) and blocks per grid (less than 65535).
I’ve tried running this on 4 different GPUs (GTX 1070, GTX 1080, GTX 1080 Ti, TITAN X (Pascal)) all with the same error.

The error only occurs when the parameters ‘dec’ and ‘fsize’ are larger than 32 and 40 respectively.

Could anyone help me with figuring out why exactly this error occurs for these parameter values? Or is there something I am doing completely wrong here?

Much thanks.

My Code:

import numpy as np
import time
from numba import cuda


@cuda.jit("void(float32[:,:], float32[:,:,:])")
def ANAM(f, A):
    x, tau, i = cuda.grid(3)
    tau += 1
    n = f.shape[1]
    if x < A.shape[0] and tau < A.shape[1]+1 and i < n-tau and i >= tau:
        out = float(0)
        denom = (tau+1)**2/(n-2*tau)
        for j in range(0, tau+1):
            for l in range(0, tau+1):
                out += abs(f[x,i+j]-f[x,i-l])
        A[x, tau-1, i] =  out/denom


cuda.select_device(3)  
gpu = cuda.get_current_device()
print(gpu.name)

flength = 832 # fixed value

dec = 32 # larger value than this causes error
fsize = 40 # value larger than 36 causes error

# example data for testing
fs = np.tile(np.sin(np.linspace(-10,10,flength), dtype=np.float32), (fsize,1))


threadsperblock = (4, 4, 64)
blockspergrid = ((fsize + (threadsperblock[0] - 1)) // threadsperblock[0], 
                  (dec + (threadsperblock[1] - 1)) // threadsperblock[1], 
                  (fs.shape[1] + (threadsperblock[2] - 1)) // threadsperblock[2])

outbuf = np.zeros((fsize, dec-1, fs.shape[1]), dtype=np.float32)

'''
fs = cuda.to_device(fs)
outbuf = cuda.to_device(outbuf)
'''

t1 = time.perf_counter()
ANAM[threadsperblock, blockspergrid](fs, outbuf)
cuda.synchronize()

print(time.perf_counter() - t1)


'''
outbuf = outbuf.copy_to_host()
'''


outbuf = outbuf.sum(axis=2)
log_out = np.log(outbuf)
log_taus = np.log(np.arange(1, dec))
lin_regress_denom = ((log_taus**2).mean() - (log_taus.mean())**2)

print(2-((log_taus*log_out).mean(axis=1)-log_taus.mean()*log_out.mean(axis=1))/lin_regress_denom)

Error message:

Traceback (most recent call last):
  File "numbatest.py", line 54, in <module>
    ANAM[threadsperblock, blockspergrid](fs, outbuf)
  File "/home/robin.vanderlaag/pythonEnvironments/lib64/python3.6/site-packages/numba/cuda/compiler.py", line 822, in __call__
    self.stream, self.sharedmem)
  File "/home/robin.vanderlaag/pythonEnvironments/lib64/python3.6/site-packages/numba/cuda/compiler.py", line 966, in call
    kernel.launch(args, griddim, blockdim, stream, sharedmem)
  File "/home/robin.vanderlaag/pythonEnvironments/lib64/python3.6/site-packages/numba/cuda/compiler.py", line 699, in launch
    cooperative=self.cooperative)
  File "/home/robin.vanderlaag/pythonEnvironments/lib64/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 2100, in launch_kernel
    None)
  File "/home/robin.vanderlaag/pythonEnvironments/lib64/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 300, in safe_cuda_api_call
    self._check_error(fname, retcode)
  File "/home/robin.vanderlaag/pythonEnvironments/lib64/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 335, in _check_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE

Output of numba.cuda.detect():

Found 4 CUDA devices
id 0    b'NVIDIA GeForce GTX 1070'                              [SUPPORTED]
                      compute capability: 6.1
                           pci device id: 0
                              pci bus id: 2
id 1    b'NVIDIA GeForce GTX 1080'                              [SUPPORTED]
                      compute capability: 6.1
                           pci device id: 0
                              pci bus id: 3
id 2    b'NVIDIA TITAN X (Pascal)'                              [SUPPORTED]
                      compute capability: 6.1
                           pci device id: 0
                              pci bus id: 129
id 3    b'NVIDIA GeForce GTX 1080 Ti'                              [SUPPORTED]
                      compute capability: 6.1
                           pci device id: 0
                              pci bus id: 130
Summary:
        4/4 devices are supported

Output of nvidia-smi:

Sun Jan 16 11:52:13 2022       -----------------------------------------------+
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
| 22%   28C    P0    33W / 151W |      0MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:03:00.0 Off |                  N/A |
| 22%   28C    P0    40W / 180W |      0MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA TITAN X ...  Off  | 00000000:81:00.0 Off |                  N/A |
| 18%   32C    P0    56W / 250W |      0MiB / 12196MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:82:00.0 Off |                  N/A |
| 18%   26C    P0    55W / 250W |      0MiB / 11178MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Robert_Crovella · January 21, 2022, 1:59am

That is incorrect. It should be:

ANAM[blockspergrid, threadsperblock]

see here

Topic		Replies	Views
CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE in Python CUDA Programming and Performance	11	9630	May 16, 2024
nvrtc execution problems CUDA Programming and Performance	3	1262	June 27, 2016
Getting “CUDA_ERROR_INVALID_VALUE: invalid argument” in python with Tensorflow 1.14 cuDNN cuda	3	2473	April 29, 2020
Problem launching kernel with driverapi CUDA Programming and Performance	1	1396	April 7, 2009
undefined reference to `cudaSetupArgument', `cudaLaunch' CUDA Programming and Performance	9	6034	November 12, 2019
CUDA might not be working properly and other warnings CUDA Programming and Performance	8	1684	July 1, 2018
Unable to run several CUDA samples. CUDA Programming and Performance	2	824	April 1, 2019
Call to cuLaunchKernel results in CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES CUDA Programming and Performance cuda , kernel , python	4	47	March 21, 2025
Not able to run AI workloads on H100 GPU AI Foundation Models and Endpoints tensorrt , cuda , tensorflow , kernel , ubuntu , cudnn , rapids	6	1085	December 28, 2024
Error: CUDA driver version is insufficient for CUDA runtime version CUDA Setup and Installation	11	16851	July 19, 2018

CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE with Numba

Related topics