I’m getting the error: “numba.cuda.cudadrv.driver.CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE” when trying to run my code.
I’m new to working with CUDA, but as far as I can see the code is within the limitations for the threads per block (less than 1024) and blocks per grid (less than 65535).
I’ve tried running this on 4 different GPUs (GTX 1070, GTX 1080, GTX 1080 Ti, TITAN X (Pascal)) all with the same error.
The error only occurs when the parameters ‘dec’ and ‘fsize’ are larger than 32 and 40 respectively.
Could anyone help me with figuring out why exactly this error occurs for these parameter values? Or is there something I am doing completely wrong here?
Much thanks.
My Code:
import numpy as np
import time
from numba import cuda
@cuda.jit("void(float32[:,:], float32[:,:,:])")
def ANAM(f, A):
x, tau, i = cuda.grid(3)
tau += 1
n = f.shape[1]
if x < A.shape[0] and tau < A.shape[1]+1 and i < n-tau and i >= tau:
out = float(0)
denom = (tau+1)**2/(n-2*tau)
for j in range(0, tau+1):
for l in range(0, tau+1):
out += abs(f[x,i+j]-f[x,i-l])
A[x, tau-1, i] = out/denom
cuda.select_device(3)
gpu = cuda.get_current_device()
print(gpu.name)
flength = 832 # fixed value
dec = 32 # larger value than this causes error
fsize = 40 # value larger than 36 causes error
# example data for testing
fs = np.tile(np.sin(np.linspace(-10,10,flength), dtype=np.float32), (fsize,1))
threadsperblock = (4, 4, 64)
blockspergrid = ((fsize + (threadsperblock[0] - 1)) // threadsperblock[0],
(dec + (threadsperblock[1] - 1)) // threadsperblock[1],
(fs.shape[1] + (threadsperblock[2] - 1)) // threadsperblock[2])
outbuf = np.zeros((fsize, dec-1, fs.shape[1]), dtype=np.float32)
'''
fs = cuda.to_device(fs)
outbuf = cuda.to_device(outbuf)
'''
t1 = time.perf_counter()
ANAM[threadsperblock, blockspergrid](fs, outbuf)
cuda.synchronize()
print(time.perf_counter() - t1)
'''
outbuf = outbuf.copy_to_host()
'''
outbuf = outbuf.sum(axis=2)
log_out = np.log(outbuf)
log_taus = np.log(np.arange(1, dec))
lin_regress_denom = ((log_taus**2).mean() - (log_taus.mean())**2)
print(2-((log_taus*log_out).mean(axis=1)-log_taus.mean()*log_out.mean(axis=1))/lin_regress_denom)
Error message:
Traceback (most recent call last):
File "numbatest.py", line 54, in <module>
ANAM[threadsperblock, blockspergrid](fs, outbuf)
File "/home/robin.vanderlaag/pythonEnvironments/lib64/python3.6/site-packages/numba/cuda/compiler.py", line 822, in __call__
self.stream, self.sharedmem)
File "/home/robin.vanderlaag/pythonEnvironments/lib64/python3.6/site-packages/numba/cuda/compiler.py", line 966, in call
kernel.launch(args, griddim, blockdim, stream, sharedmem)
File "/home/robin.vanderlaag/pythonEnvironments/lib64/python3.6/site-packages/numba/cuda/compiler.py", line 699, in launch
cooperative=self.cooperative)
File "/home/robin.vanderlaag/pythonEnvironments/lib64/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 2100, in launch_kernel
None)
File "/home/robin.vanderlaag/pythonEnvironments/lib64/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 300, in safe_cuda_api_call
self._check_error(fname, retcode)
File "/home/robin.vanderlaag/pythonEnvironments/lib64/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 335, in _check_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE
Output of numba.cuda.detect():
Found 4 CUDA devices
id 0 b'NVIDIA GeForce GTX 1070' [SUPPORTED]
compute capability: 6.1
pci device id: 0
pci bus id: 2
id 1 b'NVIDIA GeForce GTX 1080' [SUPPORTED]
compute capability: 6.1
pci device id: 0
pci bus id: 3
id 2 b'NVIDIA TITAN X (Pascal)' [SUPPORTED]
compute capability: 6.1
pci device id: 0
pci bus id: 129
id 3 b'NVIDIA GeForce GTX 1080 Ti' [SUPPORTED]
compute capability: 6.1
pci device id: 0
pci bus id: 130
Summary:
4/4 devices are supported
Output of nvidia-smi:
Sun Jan 16 11:52:13 2022 -----------------------------------------------+
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:02:00.0 Off | N/A |
| 22% 28C P0 33W / 151W | 0MiB / 8119MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:03:00.0 Off | N/A |
| 22% 28C P0 40W / 180W | 0MiB / 8119MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA TITAN X ... Off | 00000000:81:00.0 Off | N/A |
| 18% 32C P0 56W / 250W | 0MiB / 12196MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce ... Off | 00000000:82:00.0 Off | N/A |
| 18% 26C P0 55W / 250W | 0MiB / 11178MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+