Unifying the CUDA Python Ecosystem

Originally published at: Unifying the CUDA Python Ecosystem | NVIDIA Developer Blog

Python plays a key role within the science, engineering, data analytics, and deep learning application ecosystem. NVIDIA has long been committed to helping the Python ecosystem leverage the accelerated massively parallel performance of GPUs to deliver standardized libraries, tools, and applications. Today, we’re introducing another step towards simplification of the developer experience with improved Python…

So even after cuda python release. We still have to write kernel code in c++? If so, PyCuda is already does the same right.

Hi Maney,

Yes, with this release requires writing kernels, with C++ in a string, to be compiled by NVRTC. This release is to build the foundation, by provide wrappers for CUDA Driver and Runtime API. There will be more functionality and flexibility to come. Currently, partner products such as Numba and CuPy write their own CUDA layer. This release removes the need for them to do this and provide an industry standard.

If you’re interested in writing “Python” kernels, please take a look at Numba (Writing CUDA Kernels — Numba 0.50.1 documentation) and CuPy (User-Defined Kernels — CuPy 9.0.0 documentation) functionality.


will PyCuda be useless with this new unified system ?


I wouldn’t consider PyCUDA useless. PyCUDA will have the ability to utilize our new infrastructure.

Couple questions: If I have a program which is currently implemented in PyCUDA how difficult do you anticipate the switch, to CUDA Python, will be? Also will CUDA Python have support for GPU Arrays? (specifically for reductions and generating arrays of random numbers)

Super excited to have an in house NVIDIA vetted CUDA Python package, and even more excited to try it out! Thanks

Hi morgjack,

I very glad your excited. I haven’t use PyCUDA in a minute, but IIRC the conversion should be straight forward. You would need to use the corresponding Driver/Runtime API.

As far as question 2, I’m not sure at the moment. I’m pretty sure that functionality currently exists in CuPy.

This is great news, I have a lot of legacy code for Python extensions that I wish to port to PyCUDA, but this can provide now an officially supported approach to integrate advanced kernels. Is there any ETA when this will be officially released or when a preview will be available?

The best timeline I can give you at the moment is 2H’21.

I’m trying to use the new platform, but only importing some library, for example, I try to run the vectorAddDrv.py program and just include the library #include <cuComplex.h> in the cuda string and the compiler fails. :
/cuda-python-main/examples/0_Simple$ CUDA_HOME=/opt/cuda/11.4 python3 vectorAddDrv.py
Vector Addition (API Driver)
sourceCode.cu(10): catastrophic error: cannot open source file “cuComplex.h”

1 catastrophic error detected in the compilation of “sourceCode.cu”.
Compilation terminated.


Hi Emanuel,

When that program passes the string to NVRTC, it sets --include-path=$CUDA_HOME/include. If the resulting directory doesn’t have cuComplex.h then we expect an error.

Check that your /opt/cuda/11.4/include/cuComplex.h exist. Take a look at how this program uses the KernelHelper (link) to pass the --include-path.

Hi, vzhurba,

Thanks, I managed to solve it, the error was on my way CUDA_HOME, now it’s working perfectly. I tested a program of mine that I’m developing in my Master’s Thesis in pure Cuda-C and translated it to the new platform and realized that it still doesn’t achieve the same performance, but, the new platform proved to be much faster than PyCuda, which left me very much pleased. Congratulations.