Facing a issue working with numba and cupy

I am trying to learn GPU acceleration using Numba and CuPy for my research work. In this process, I need to use SciPy routines along with Numba. I would like to know if it is possible to combine Numba’s loop parallelization with the usage of SciPy and CuPy functions. I would appreciate any suggestions on how to address these issues. The code is given below:

import numpy as np
import numba
from numba import cuda
import cupy as cp
@cuda.jit
def test(i, p):
    """
    Square the input number and store it in the output array.
    Inputs:
        i: input number to be squared
        p: output array to store the squared result
    """
    p[i] = i**2
    
@cuda.jit 
def main_gpu(p):
    i = cuda.grid(1)
    if i < p.shape[0]:
      row  = [0, 0, 1, 3, 1, 0, 0]
      col  = [0, 2, 1, 3, 1, 0, 0]
      data = [1, 1, 1, 1, 1, 1, 1]
      matrix=cp.sparse.coo_matrix((data, (row, col)), shape=(4, 4))
      test(i, p)

tnum = 50
p = np.zeros(tnum)
threadsperblock = 32
blockspergrid = (tnum + (threadsperblock - 1)) // threadsperblock

main_gpu[blockspergrid, threadsperblock](p)

The error from this code is:

TypingError: Failed in cuda mode pipeline (step: nopython frontend)
Unknown attribute 'coo_matrix' of type Module(<module 'cupy.sparse' from '/usr/local/lib/python3.10/dist-packages/cupy/sparse/__init__.py'>)

File "<ipython-input-14-120ea31d09be>", line 23:
def main_gpu(p):
    <source elided>
      data = [1, 1, 1, 1, 1, 1, 1]
      cp.sparse.coo_matrix((data, (row, col)), shape=(4, 4))
      ^

During: typing of get attribute at <ipython-input-14-120ea31d09be> (23)

File "<ipython-input-14-120ea31d09be>", line 23:
def main_gpu(p):
    <source elided>
      data = [1, 1, 1, 1, 1, 1, 1]
      cp.sparse.coo_matrix((data, (row, col)), shape=(4, 4))
      ^

what you can use in a numba cuda.jit decorated region is documented in the numba documentation. no, you generally can’t use other library calls like cupy and scipy

Thank you. In my work, I have a parameterized matrix that depends on a loop variable. I would like to parallelize the loop using GPU cores so that matrix operations, such as eigenvalue calculation, can be performed in parallel on each core. Is it possible to achieve this using GPU acceleration using numba? I would greatly appreciate any suggestions and guidance.

Have you looked at the numba cuda documentation to see what operations are legal in a numba cuda.jit decorated region? If so, what is your opinion of being able to do:

there?

This type of question comes up frequently on forums. This viewpoint:

Is not consistent with how a numba cuda kernel parallelizes work. You would not ever attempt to perform an eigenvalue calculation on each CUDA core. Instead, the cuda cores would work together collectively to perform a single eigenvalue calculation, by parallelizing the work at the data level, so that matrix operations like addition, subtraction, multiplication, division across rows will be parallelized. You would then combine such lower level operations to perform a higher level operation like eigenvalue calculation.

Consistent with that, you cannot call library functions (for the most part) from a numba cuda kernel. I’m just restating what I’ve already said there.

I suggest more study of numba cuda kernels (there are plenty of examples, questions, and tutorials) so that you learn what can and what can’t be done, and how an operation like a matrix eigenvalue calculation would be done in a CUDA kernel.

As a general principal, I would never suggest that folks try to write their own matrix eigenvalue calculation in low level CUDA code, whether in numba or not. Use a library implementation. If you want to use python, cupy might be worth a look. But no, you cannot call cupy routines from a numba cuda.jit region. Instead you would use cupy directly.

1 Like

I have tried implementing this using CuPy, and it has provided a huge acceleration. Thank you so much for your assistance. To improve my understanding, I will also thoroughly read the documentation.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.