Error running CUDA Python code in Jupyter Notebook after installing NVIDIA drivers

Greetings to the programming community,

As a beginner, I recently started studying the “Accelerated Computing with CUDA Python” course offered by NVIDIA. I have successfully installed all the necessary drivers from the NVIDIA website to configure my environment.

Currently, I am attempting to execute the following code
in a Jupyter Notebook:

import numba
import numpy as np
from numba import vectorize
from numba import cuda

@vectorize(['int64(int64, int64)'], target='cuda')  # Type signature and target are required for the GPU
def add_ufunc(x, y):
    return x + y

c = add_ufunc(a, b)


However, I'm encountering an error with the
 following diagnostics:


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 10
      6 @vectorize(['int64(int64, int64)'], target='cuda') # Type signature and target are required for the GPU
      7 def add_ufunc(x, y):
      8     return x + y
---> 10 c = add_ufunc(a,b)

File c:\Users\korob\anaconda3\lib\site-packages\numba\cuda\vectorizers.py:36, in CUDAUFuncDispatcher.__call__(self, *args, **kws)
     25 def __call__(self, *args, **kws):
     26     """
     27     *args: numpy arrays or DeviceArrayBase (created by cuda.to_device).
     28            Cannot mix the two types in one call.
   (...)
     34                   the input arguments.
     35     """
---> 36     return CUDAUFuncMechanism.call(self.functions, args, kws)

File c:\Users\korob\anaconda3\lib\site-packages\numba\np\ufunc\deviceufunc.py:250, in UFuncMec

I am working on Windows 11, 
and my computer is equipped with an 
NVIDIA GeForce GTX 1060 with Max-Q Design graphics card.

I would greatly appreciate any assistance
 as I have already spent two days trying 
to resolve this issue. 
What could be causing this error, 
and what steps or modifications 
should I take to successfully 
run this code in Jupyter Notebook?

Thank you sincerely for any help or guidance 
you can provide.

you haven’t defined a and b anywhere

You will need basic python skills to be able to make good use of that course content

Thanks, really, sorry, there was a bug. I changed original code before sending the message, so it cause the bug.
There below we can observe the same result.

import numba
import numpy as np
from numba import vectorize
from numba import cuda
@vectorize([‘int64(int64, int64)’], target=‘cuda’) #
def add_ufunc(x, y):
return x + y

a = np.arange(1.0, 10.0)
b = np.arange(1.0, 10.0)
c = add_ufunc(a,b)

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[2], line 12 10 a = np.arange(1.0, 10.0) 11 b = np.arange(1.0, 10.0) —> 12 c = add_ufunc(a,b) 13 print(c) File [d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\cuda\vectorizers.py:28](file:///D:/weizmann/my_Scripts/.venv/Lib/site-packages/numba/cuda/vectorizers.py:28), in CUDAUFuncDispatcher.call(self, *args, **kws) 17 def call(self, *args, **kws): 18 “”" 19 *args: numpy arrays or DeviceArrayBase (created by cuda.to_device). 20 Cannot mix the two types in one call. (…) 26 the input arguments. 27 “”" —> 28 return CUDAUFuncMechanism.call(self.functions, args, kws) File [d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:254](file:///D:/weizmann/my_Scripts/.venv/Lib/site-packages/numba/np/ufunc/deviceufunc.py:254), in UFuncMechanism.call(cls, typemap, args, kws) 252 # Begin call resolution 253 cr = cls(typemap, args) → 254 args = cr.get_arguments() 255 resty, func = cr.get_function() 257 outshape = args[0].shape File [d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:204](file:///D:/weizmann/my_Scripts/.venv/Lib/site-packages/numba/np/ufunc/deviceufunc.py:204), in UFuncMechanism.get_arguments(self) 202 self._fill_arrays() 203 self._fill_argtypes() → 204 self._resolve_signature() 205 arys = self._get_actual_args() 206 return self._broadcast(arys) File [d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:144](file:///D:/weizmann/my_Scripts/.venv/Lib/site-packages/numba/np/ufunc/deviceufunc.py:144), in UFuncMechanism._resolve_signature(self) 141 matches.append(formaltys) 143 if not matches: → 144 raise TypeError("No matching version. GPU ufunc requires array " 145 "arguments to have the exact types. This behaves " 146 “like regular ufunc with casting=‘no’.”) 148 if len(matches) > 1: 149 raise TypeError("Failed to resolve ufunc due to ambiguous " 150 "signature. Too many untyped scalars. " 151 “Use numpy dtype object to type tag.”) TypeError: No matching version. GPU ufunc requires array arguments to have the exact types. This behaves like regular ufunc with casting=‘no’.

why are you sending floating point values to a function with signature that uses only int64 ?

paying careful attention to types matters quite a bit more here than it does for a python 101 level of activity.

Dear Robert, thanks a lot. Sorry, I actually sent not original code, because I modified it before sending, contributed new bug and didn’t check it.
The bug’s reason is different. This sample I took from course.
Here below is corrected code and diagnostics of the bug.

import numba import numpy as np
from numba import vectorize
from numba import cuda
@vectorize([‘int64(int64, int64)’], target=‘cuda’) # Type signature and target are required for the GPU
def add_ufunc(x, y): return x + y
a = np.arange(1.0, 10.0)
b = np.arange(1.0, 10.0)
c = add_ufunc(a,b)
print(c)

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[3], line 12 10 a = np.arange(1.0, 10.0) 11 b = np.arange(1.0, 10.0) —> 12 c = add_ufunc(a,b) 13 print(c) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\cuda\vectorizers.py:28, in CUDAUFuncDispatcher.call(self, *args, **kws) 17 def call(self, *args, **kws): 18 “”" 19 *args: numpy arrays or DeviceArrayBase (created by cuda.to_device). 20 Cannot mix the two types in one call. (…) 26 the input arguments. 27 “”" —> 28 return CUDAUFuncMechanism.call(self.functions, args, kws) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:254, in UFuncMechanism.call(cls, typemap, args, kws) 252 # Begin call resolution 253 cr = cls(typemap, args) → 254 args = cr.get_arguments() 255 resty, func = cr.get_function() 257 outshape = args[0].shape File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:204, in UFuncMechanism.get_arguments(self) 202 self._fill_arrays() 203 self._fill_argtypes() → 204 self._resolve_signature() 205 arys = self._get_actual_args() 206 return self._broadcast(arys) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:144, in UFuncMechanism._resolve_signature(self) 141 matches.append(formaltys) 143 if not matches: → 144 raise TypeError("No matching version. GPU ufunc requires array " 145 "arguments to have the exact types. This behaves " 146 “like regular ufunc with casting=‘no’.”) 148 if len(matches) > 1: 149 raise TypeError("Failed to resolve ufunc due to ambiguous " 150 "signature. Too many untyped scalars. " 151 “Use numpy dtype object to type tag.”) TypeError: No matching version. GPU ufunc requires array arguments to have the exact types. This behaves like regular ufunc with casting=‘no’.

пт, 14 июл. 2023 г. в 18:59, Robert_Crovella via NVIDIA Developer Forums <notifications@nvidia.discoursemail.com>:

I am sincerely grateful for you for help.
As always you are right…
But nevertheless bug didn’t go out.

import numba
import numpy as np
from numba import vectorize
from numba import cuda

@vectorize([‘int64(int64, int64)’], target=‘cuda’) # Type signature and target are required for the GPU
def add_ufunc(x, y):
return x + y

a = np.arange(1, 10, )
b = np.arange(1, 10)
c = add_ufunc(a,b)
print(c)

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[4], line 12 10 a = np.arange(1, 10) 11 b = np.arange(1, 10) —> 12 c = add_ufunc(a,b) 13 print(c) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\cuda\vectorizers.py:28, in CUDAUFuncDispatcher.call(self, *args, **kws) 17 def call(self, *args, **kws): 18 “”" 19 *args: numpy arrays or DeviceArrayBase (created by cuda.to_device). 20 Cannot mix the two types in one call. (…) 26 the input arguments. 27 “”" —> 28 return CUDAUFuncMechanism.call(self.functions, args, kws) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:254, in UFuncMechanism.call(cls, typemap, args, kws) 252 # Begin call resolution 253 cr = cls(typemap, args) → 254 args = cr.get_arguments() 255 resty, func = cr.get_function() 257 outshape = args[0].shape File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:204, in UFuncMechanism.get_arguments(self) 202 self._fill_arrays() 203 self._fill_argtypes() → 204 self._resolve_signature() 205 arys = self._get_actual_args() 206 return self._broadcast(arys) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:144, in UFuncMechanism._resolve_signature(self) 141 matches.append(formaltys) 143 if not matches: → 144 raise TypeError("No matching version. GPU ufunc requires array " 145 "arguments to have the exact types. This behaves " 146 “like regular ufunc with casting=‘no’.”) 148 if len(matches) > 1: 149 raise TypeError("Failed to resolve ufunc due to ambiguous " 150 "signature. Too many untyped scalars. " 151 “Use numpy dtype object to type tag.”) TypeError: No matching version. GPU ufunc requires array arguments to have the exact types. This behaves like regular ufunc with casting=‘no’.

пт, 14 июл. 2023 г. в 20:37, Robert_Crovella via NVIDIA Developer Forums <notifications@nvidia.discoursemail.com>:

try:

a = np.arange(1, 10, dtype=np.int64)
b = np.arange(1, 10, dtype=np.int64)

here code and diagnostics.

import numba
import numpy as np
from numba import vectorize
from numba import cuda

@vectorize([‘int64(int64, int64)’], target=‘cuda’) # Type signature and target are required for the GPU
def add_ufunc(x, y):
return x + y
a = np.arange(1, 10 )
b = np.arange(1, 10)
c = add_ufunc(a,b)
print(c)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[2], line 12 10 a = np.arange(1, 10 ) 11 b = np.arange(1, 10) —> 12 c = add_ufunc(a,b) 13 print(c) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\cuda\vectorizers.py:28, in CUDAUFuncDispatcher.call(self, *args, **kws) 17 def call(self, *args, **kws): 18 “”" 19 *args: numpy arrays or DeviceArrayBase (created by cuda.to_device). 20 Cannot mix the two types in one call. (…) 26 the input arguments. 27 “”" —> 28 return CUDAUFuncMechanism.call(self.functions, args, kws) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:254, in UFuncMechanism.call(cls, typemap, args, kws) 252 # Begin call resolution 253 cr = cls(typemap, args) → 254 args = cr.get_arguments() 255 resty, func = cr.get_function() 257 outshape = args[0].shape File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:204, in UFuncMechanism.get_arguments(self) 202 self._fill_arrays() 203 self._fill_argtypes() → 204 self._resolve_signature() 205 arys = self._get_actual_args() 206 return self._broadcast(arys) File d:\weizmann\my_Scripts.venv\Lib\site-packages\numba\np\ufunc\deviceufunc.py:144, in UFuncMechanism._resolve_signature(self) 141 matches.append(formaltys) 143 if not matches: → 144 raise TypeError("No matching version. GPU ufunc requires array " 145 "arguments to have the exact types. This behaves " 146 “like regular ufunc with casting=‘no’.”) 148 if len(matches) > 1: 149 raise TypeError("Failed to resolve ufunc due to ambiguous " 150 "signature. Too many untyped scalars. " 151 “Use numpy dtype object to type tag.”) TypeError: No matching version. GPU ufunc requires array arguments to have the exact types. This behaves like regular ufunc with casting=‘no’.

пт, 14 июл. 2023 г. в 20:46, Yury Korobotchkin <korobotchkin@gmail.com>:

Sorry, your replies don’t make any sense to me.

Here is what I see:

$ cat t68.py
import numba
import numpy as np
from numba import vectorize
from numba import cuda

@vectorize(['int64(int64, int64)'], target='cuda') # Type signature and target are required for the GPU
def add_ufunc(x, y):
    return x + y
a = np.arange(1, 10 )
b = np.arange(1, 10)
c = add_ufunc(a,b)
print(c)
$ python t68.py
[ 2  4  6  8 10 12 14 16 18]
$

It’s possible that a newer python or numba version doesn’t like that code. The only other suggestion I have is to decorate the int64 type on the arange definitions (but: arange is supposed to infer types from the supplied arguments, and AFAIK the default int type in python is 64 bits.). I’m not able to reproduce the trouble.

Thanks for trying!

пт, 14 июл. 2023 г., 21:55 Robert_Crovella via NVIDIA Developer Forums <notifications@nvidia.discoursemail.com>:

Greetings to the programming community,
I am writing to seek your assistance and expertise on a topic related to CUDA programming. I have been exploring the CUDA platform and am currently working with an NVIDIA GeForce GTX 1060 with Max-Q Design GPU on my laptop.

Specifically, I am curious to know if there is a maximum limit on the number of threads that can be used in a CUDA program or total memory in GPU. I am wondering if there exists a function or command that allows me to determine this thread limit or memory limit. I have been experimenting with CUDA programming and have encountered situations where understanding this limit would be incredibly helpful.

For illustration purposes, I have included a code snippet below that I have been running:

pythonCopy code

import numpy as np
from numba import jit, njit
import math
from numba import cuda

@cuda.jit
def kernel_plus_1(d_inp, d_out):    
    nr = cuda.blockIdx.x
    nc = cuda.threadIdx.x    
    d_out[nr][nc] = d_inp[nr][nc] + 1

qr = 1024 * 5
qc = 1024 * 512
arr = np.arange(qr * qc, dtype='int64').reshape(qr, qc)
rez0 = arr + 1
d_inp = cuda.to_device(arr)
out = np.zeros_like(arr)
d_out = cuda.to_device(out)

blocks = int(arr.shape[0])
threads = int(arr.shape[1])
kernel_plus_1[blocks, threads](d_inp, d_out)
out = d_out.copy_to_host()
b = np.array_equal(rez0, out)
print(b)

This code causes diagnostics:
Exception has occurred: CudaAPIError
[1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE
File “D:\py-repo\fdmt_cu.py”, line 366, in solver_fdmt_cu_v2
FDMT_iteration_cu_v2[blocks,threads_per_block](d_input,maxDT,F,f_min,f_max,i_t, d_Output)
File “D:\py-repo\fdmt_cu.py”, line 229, in FDMT
State_fdmt_cu_v2 = solver_fdmt_cu_v2(State_fdmt_cu_v0,maxDT,F,f_min,f_max,dataType, Verbose)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\py-repo\fdmt_cu.py”, line 79, in FDMT_test_curve
DM0 = np.real(FUNC(np.ones(XX.shape,dataType),f_min,f_max,maxDT,dataType,Verbose))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\py-repo\fdmt_cu.py”, line 456, in
inp, out, out1 = FDMT_test_curve()
^^^^^^^^^^^^^^^^^
numba.cuda.cudadrv.driver.CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE

But if I reduce quantity of threads, everythings are ok, for instance:
qr = 512
qc = 512
I would greatly appreciate any insights, explanations, or guidance you could provide regarding the maximum thread or memory limit in device and any functions or commands that can help determine.
Yury

CUDA is limited to 1024 threads per block. This is your threads variable:

qc = 1024 * 512
arr = np.arange(qr * qc, dtype='int64').reshape(qr, qc)
....
threads = int(arr.shape[1])
kernel_plus_1[blocks, threads](d_inp, d_out)

so effectively you are setting threads to be equal to qc.

When qc is larger than 1024, you will get that error.