Error running CUDA Python code in Jupyter Notebook after installing NVIDIA drivers

korobotchkin · July 14, 2023, 3:05pm

Greetings to the programming community,

As a beginner, I recently started studying the “Accelerated Computing with CUDA Python” course offered by NVIDIA. I have successfully installed all the necessary drivers from the NVIDIA website to configure my environment.

Currently, I am attempting to execute the following code
in a Jupyter Notebook:

import numba
import numpy as np
from numba import vectorize
from numba import cuda

@vectorize(['int64(int64, int64)'], target='cuda')  # Type signature and target are required for the GPU
def add_ufunc(x, y):
    return x + y

c = add_ufunc(a, b)


However, I'm encountering an error with the
 following diagnostics:


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 10
      6 @vectorize(['int64(int64, int64)'], target='cuda') # Type signature and target are required for the GPU
      7 def add_ufunc(x, y):
      8     return x + y
---> 10 c = add_ufunc(a,b)

File c:\Users\korob\anaconda3\lib\site-packages\numba\cuda\vectorizers.py:36, in CUDAUFuncDispatcher.__call__(self, *args, **kws)
     25 def __call__(self, *args, **kws):
     26     """
     27     *args: numpy arrays or DeviceArrayBase (created by cuda.to_device).
     28            Cannot mix the two types in one call.
   (...)
     34                   the input arguments.
     35     """
---> 36     return CUDAUFuncMechanism.call(self.functions, args, kws)

File c:\Users\korob\anaconda3\lib\site-packages\numba\np\ufunc\deviceufunc.py:250, in UFuncMec

I am working on Windows 11, 
and my computer is equipped with an 
NVIDIA GeForce GTX 1060 with Max-Q Design graphics card.

I would greatly appreciate any assistance
 as I have already spent two days trying 
to resolve this issue. 
What could be causing this error, 
and what steps or modifications 
should I take to successfully 
run this code in Jupyter Notebook?

Thank you sincerely for any help or guidance 
you can provide.

Robert_Crovella · July 14, 2023, 3:58pm

you haven’t defined a and b anywhere

You will need basic python skills to be able to make good use of that course content

korobotchkin · July 14, 2023, 5:23pm

Thanks, really, sorry, there was a bug. I changed original code before sending the message, so it cause the bug.
There below we can observe the same result.

import numba
import numpy as np
from numba import vectorize
from numba import cuda
@vectorize([‘int64(int64, int64)’], target=‘cuda’) #
def add_ufunc(x, y):
return x + y

a = np.arange(1.0, 10.0)
b = np.arange(1.0, 10.0)
c = add_ufunc(a,b)

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[2], line 12 10 a = np.arange(1.0, 10.0) 11 b = np.arange(1.0, 10.0) —> 12 c = add_ufunc(a,b) 13 print(c) File [d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\cuda\vectorizers.py:28](file:///D:/weizmann/my_Scripts/.venv/Lib/site-packages/numba/cuda/vectorizers.py:28), in CUDAUFuncDispatcher.call(self, *args, **kws) 17 def call(self, *args, **kws): 18 “”" 19 *args: numpy arrays or DeviceArrayBase (created by cuda.to_device). 20 Cannot mix the two types in one call. (…) 26 the input arguments. 27 “”" —> 28 return CUDAUFuncMechanism.call(self.functions, args, kws) File [d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:254](file:///D:/weizmann/my_Scripts/.venv/Lib/site-packages/numba/np/ufunc/deviceufunc.py:254), in UFuncMechanism.call(cls, typemap, args, kws) 252 # Begin call resolution 253 cr = cls(typemap, args) → 254 args = cr.get_arguments() 255 resty, func = cr.get_function() 257 outshape = args[0].shape File [d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:204](file:///D:/weizmann/my_Scripts/.venv/Lib/site-packages/numba/np/ufunc/deviceufunc.py:204), in UFuncMechanism.get_arguments(self) 202 self._fill_arrays() 203 self._fill_argtypes() → 204 self._resolve_signature() 205 arys = self._get_actual_args() 206 return self._broadcast(arys) File [d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:144](file:///D:/weizmann/my_Scripts/.venv/Lib/site-packages/numba/np/ufunc/deviceufunc.py:144), in UFuncMechanism._resolve_signature(self) 141 matches.append(formaltys) 143 if not matches: → 144 raise TypeError("No matching version. GPU ufunc requires array " 145 "arguments to have the exact types. This behaves " 146 “like regular ufunc with casting=‘no’.”) 148 if len(matches) > 1: 149 raise TypeError("Failed to resolve ufunc due to ambiguous " 150 "signature. Too many untyped scalars. " 151 “Use numpy dtype object to type tag.”) TypeError: No matching version. GPU ufunc requires array arguments to have the exact types. This behaves like regular ufunc with casting=‘no’.

Robert_Crovella · July 14, 2023, 5:36pm

why are you sending floating point values to a function with signature that uses only int64 ?

paying careful attention to types matters quite a bit more here than it does for a python 101 level of activity.

korobotchkin · July 14, 2023, 5:36pm

Dear Robert, thanks a lot. Sorry, I actually sent not original code, because I modified it before sending, contributed new bug and didn’t check it.
The bug’s reason is different. This sample I took from course.
Here below is corrected code and diagnostics of the bug.

import numba import numpy as np
from numba import vectorize
from numba import cuda
@vectorize([‘int64(int64, int64)’], target=‘cuda’) # Type signature and target are required for the GPU
def add_ufunc(x, y): return x + y
a = np.arange(1.0, 10.0)
b = np.arange(1.0, 10.0)
c = add_ufunc(a,b)
print(c)

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[3], line 12 10 a = np.arange(1.0, 10.0) 11 b = np.arange(1.0, 10.0) —> 12 c = add_ufunc(a,b) 13 print(c) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\cuda\vectorizers.py:28, in CUDAUFuncDispatcher.call(self, *args, **kws) 17 def call(self, *args, **kws): 18 “”" 19 *args: numpy arrays or DeviceArrayBase (created by cuda.to_device). 20 Cannot mix the two types in one call. (…) 26 the input arguments. 27 “”" —> 28 return CUDAUFuncMechanism.call(self.functions, args, kws) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:254, in UFuncMechanism.call(cls, typemap, args, kws) 252 # Begin call resolution 253 cr = cls(typemap, args) → 254 args = cr.get_arguments() 255 resty, func = cr.get_function() 257 outshape = args[0].shape File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:204, in UFuncMechanism.get_arguments(self) 202 self._fill_arrays() 203 self._fill_argtypes() → 204 self._resolve_signature() 205 arys = self._get_actual_args() 206 return self._broadcast(arys) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:144, in UFuncMechanism._resolve_signature(self) 141 matches.append(formaltys) 143 if not matches: → 144 raise TypeError("No matching version. GPU ufunc requires array " 145 "arguments to have the exact types. This behaves " 146 “like regular ufunc with casting=‘no’.”) 148 if len(matches) > 1: 149 raise TypeError("Failed to resolve ufunc due to ambiguous " 150 "signature. Too many untyped scalars. " 151 “Use numpy dtype object to type tag.”) TypeError: No matching version. GPU ufunc requires array arguments to have the exact types. This behaves like regular ufunc with casting=‘no’.

пт, 14 июл. 2023 г. в 18:59, Robert_Crovella via NVIDIA Developer Forums <notifications@nvidia.discoursemail.com>:

korobotchkin · July 14, 2023, 5:44pm

I am sincerely grateful for you for help.
As always you are right…
But nevertheless bug didn’t go out.

import numba
import numpy as np
from numba import vectorize
from numba import cuda

@vectorize([‘int64(int64, int64)’], target=‘cuda’) # Type signature and target are required for the GPU
def add_ufunc(x, y):
return x + y

a = np.arange(1, 10, )
b = np.arange(1, 10)
c = add_ufunc(a,b)
print(c)

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[4], line 12 10 a = np.arange(1, 10) 11 b = np.arange(1, 10) —> 12 c = add_ufunc(a,b) 13 print(c) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\cuda\vectorizers.py:28, in CUDAUFuncDispatcher.call(self, *args, **kws) 17 def call(self, *args, **kws): 18 “”" 19 *args: numpy arrays or DeviceArrayBase (created by cuda.to_device). 20 Cannot mix the two types in one call. (…) 26 the input arguments. 27 “”" —> 28 return CUDAUFuncMechanism.call(self.functions, args, kws) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:254, in UFuncMechanism.call(cls, typemap, args, kws) 252 # Begin call resolution 253 cr = cls(typemap, args) → 254 args = cr.get_arguments() 255 resty, func = cr.get_function() 257 outshape = args[0].shape File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:204, in UFuncMechanism.get_arguments(self) 202 self._fill_arrays() 203 self._fill_argtypes() → 204 self._resolve_signature() 205 arys = self._get_actual_args() 206 return self._broadcast(arys) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:144, in UFuncMechanism._resolve_signature(self) 141 matches.append(formaltys) 143 if not matches: → 144 raise TypeError("No matching version. GPU ufunc requires array " 145 "arguments to have the exact types. This behaves " 146 “like regular ufunc with casting=‘no’.”) 148 if len(matches) > 1: 149 raise TypeError("Failed to resolve ufunc due to ambiguous " 150 "signature. Too many untyped scalars. " 151 “Use numpy dtype object to type tag.”) TypeError: No matching version. GPU ufunc requires array arguments to have the exact types. This behaves like regular ufunc with casting=‘no’.

пт, 14 июл. 2023 г. в 20:37, Robert_Crovella via NVIDIA Developer Forums <notifications@nvidia.discoursemail.com>:

Robert_Crovella · July 14, 2023, 5:49pm

try:

a = np.arange(1, 10, dtype=np.int64)
b = np.arange(1, 10, dtype=np.int64)

korobotchkin · July 14, 2023, 5:50pm

here code and diagnostics.

import numba
import numpy as np
from numba import vectorize
from numba import cuda

@vectorize([‘int64(int64, int64)’], target=‘cuda’) # Type signature and target are required for the GPU
def add_ufunc(x, y):
return x + y
a = np.arange(1, 10 )
b = np.arange(1, 10)
c = add_ufunc(a,b)
print(c)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[2], line 12 10 a = np.arange(1, 10 ) 11 b = np.arange(1, 10) —> 12 c = add_ufunc(a,b) 13 print(c) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\cuda\vectorizers.py:28, in CUDAUFuncDispatcher.call(self, *args, **kws) 17 def call(self, *args, **kws): 18 “”" 19 *args: numpy arrays or DeviceArrayBase (created by cuda.to_device). 20 Cannot mix the two types in one call. (…) 26 the input arguments. 27 “”" —> 28 return CUDAUFuncMechanism.call(self.functions, args, kws) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:254, in UFuncMechanism.call(cls, typemap, args, kws) 252 # Begin call resolution 253 cr = cls(typemap, args) → 254 args = cr.get_arguments() 255 resty, func = cr.get_function() 257 outshape = args[0].shape File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:204, in UFuncMechanism.get_arguments(self) 202 self._fill_arrays() 203 self._fill_argtypes() → 204 self._resolve_signature() 205 arys = self._get_actual_args() 206 return self._broadcast(arys) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:144, in UFuncMechanism._resolve_signature(self) 141 matches.append(formaltys) 143 if not matches: → 144 raise TypeError("No matching version. GPU ufunc requires array " 145 "arguments to have the exact types. This behaves " 146 “like regular ufunc with casting=‘no’.”) 148 if len(matches) > 1: 149 raise TypeError("Failed to resolve ufunc due to ambiguous " 150 "signature. Too many untyped scalars. " 151 “Use numpy dtype object to type tag.”) TypeError: No matching version. GPU ufunc requires array arguments to have the exact types. This behaves like regular ufunc with casting=‘no’.

пт, 14 июл. 2023 г. в 20:46, Yury Korobotchkin <korobotchkin@gmail.com>:

Robert_Crovella · July 14, 2023, 5:52pm

Sorry, your replies don’t make any sense to me.

Robert_Crovella · July 14, 2023, 6:53pm

Here is what I see:

$ cat t68.py
import numba
import numpy as np
from numba import vectorize
from numba import cuda

@vectorize(['int64(int64, int64)'], target='cuda') # Type signature and target are required for the GPU
def add_ufunc(x, y):
    return x + y
a = np.arange(1, 10 )
b = np.arange(1, 10)
c = add_ufunc(a,b)
print(c)
$ python t68.py
[ 2  4  6  8 10 12 14 16 18]
$

It’s possible that a newer python or numba version doesn’t like that code. The only other suggestion I have is to decorate the int64 type on the arange definitions (but: arange is supposed to infer types from the supplied arguments, and AFAIK the default int type in python is 64 bits.). I’m not able to reproduce the trouble.

korobotchkin · July 14, 2023, 7:42pm

Thanks for trying!

пт, 14 июл. 2023 г., 21:55 Robert_Crovella via NVIDIA Developer Forums <notifications@nvidia.discoursemail.com>:

korobotchkin · August 5, 2023, 3:29pm

Greetings to the programming community,
I am writing to seek your assistance and expertise on a topic related to CUDA programming. I have been exploring the CUDA platform and am currently working with an NVIDIA GeForce GTX 1060 with Max-Q Design GPU on my laptop.

Specifically, I am curious to know if there is a maximum limit on the number of threads that can be used in a CUDA program or total memory in GPU. I am wondering if there exists a function or command that allows me to determine this thread limit or memory limit. I have been experimenting with CUDA programming and have encountered situations where understanding this limit would be incredibly helpful.

For illustration purposes, I have included a code snippet below that I have been running:

pythonCopy code

import numpy as np
from numba import jit, njit
import math
from numba import cuda

@cuda.jit
def kernel_plus_1(d_inp, d_out):    
    nr = cuda.blockIdx.x
    nc = cuda.threadIdx.x    
    d_out[nr][nc] = d_inp[nr][nc] + 1

qr = 1024 * 5
qc = 1024 * 512
arr = np.arange(qr * qc, dtype='int64').reshape(qr, qc)
rez0 = arr + 1
d_inp = cuda.to_device(arr)
out = np.zeros_like(arr)
d_out = cuda.to_device(out)

blocks = int(arr.shape[0])
threads = int(arr.shape[1])
kernel_plus_1[blocks, threads](d_inp, d_out)
out = d_out.copy_to_host()
b = np.array_equal(rez0, out)
print(b)

This code causes diagnostics:
Exception has occurred: CudaAPIError
[1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE
File “D:\py-repo\fdmt_cu.py”, line 366, in solver_fdmt_cu_v2
FDMT_iteration_cu_v2[blocks,threads_per_block](d_input,maxDT,F,f_min,f_max,i_t, d_Output)
File “D:\py-repo\fdmt_cu.py”, line 229, in FDMT
State_fdmt_cu_v2 = solver_fdmt_cu_v2(State_fdmt_cu_v0,maxDT,F,f_min,f_max,dataType, Verbose)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\py-repo\fdmt_cu.py”, line 79, in FDMT_test_curve
DM0 = np.real(FUNC(np.ones(XX.shape,dataType),f_min,f_max,maxDT,dataType,Verbose))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\py-repo\fdmt_cu.py”, line 456, in
inp, out, out1 = FDMT_test_curve()
^^^^^^^^^^^^^^^^^
numba.cuda.cudadrv.driver.CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE

But if I reduce quantity of threads, everythings are ok, for instance:
qr = 512
qc = 512
I would greatly appreciate any insights, explanations, or guidance you could provide regarding the maximum thread or memory limit in device and any functions or commands that can help determine.
Yury

Robert_Crovella · August 7, 2023, 2:10pm

CUDA is limited to 1024 threads per block. This is your threads variable:

qc = 1024 * 512
arr = np.arange(qr * qc, dtype='int64').reshape(qr, qc)
....
threads = int(arr.shape[1])
kernel_plus_1[blocks, threads](d_inp, d_out)

so effectively you are setting threads to be equal to qc.

When qc is larger than 1024, you will get that error.

Topic		Replies	Views
CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE in Python CUDA Programming and Performance	11	9833	May 16, 2024
Run Python with CUDA on Jetson TX2 Jetson TX2	2	820	October 18, 2021
nano + tensorflow + numba Jetson Nano	3	1029	October 18, 2021
Issues with NVIDA DLI - Fundamentals of Accelerated Computing with Cuda Python CUDA Programming and Performance	3	607	June 12, 2021
Cannot launch notebook since this morning for "Introduction to CUDA with Numba" Teaching & Curriculum Support cuda , python	2	87	August 28, 2024
UNKNOWN_CUDA_ERROR after/during kernel execution using Numba CUDA Programming and Performance	4	5870	October 10, 2019
Numba vectorize throws unable to resolve dtype for cuda target Teaching & Curriculum Support cuda , numba	0	65	April 20, 2025
Testing the excecution with and with out GPU and CUDA in Jetson TX2 Jetson TX2	4	3335	October 18, 2021
Seven Things You Might Not Know about Numba Technical Blog	9	960	March 18, 2023
I was facing issue with cuda in python CUDA Programming and Performance	0	275	February 19, 2024

Error running CUDA Python code in Jupyter Notebook after installing NVIDIA drivers

Related topics