I have a chunk of code that I need to speed up considerably. I’ve achieved about a 30-40x speedup just by using Numba but it still needs to be faster. I’ve written up the kernel in PyCuda but I’m running into some issues and there’s just not great documentation is seems. I’m trying to figure out if it’s even worth working with PyCuda or if I should just go straight into CUDA. I’ve taken a few courses (3 years ago) with CUDA so I know it somewhat, but I spend 90% of my time in Python these days. To those practitioners that use CUDA and Python, how do you integrate the two?uc browser shareit appvn
When you use pycuda, you write the kernel code in CUDA C++. Therefore, from the perspective of a single kernel, there really shouldn’t be much difference in effort or performance between pycuda and cuda C++. Obviously the wrapper code/binding is different, but you should be able do most things in pycuda that you can do in CUDA. There might be a few outliers like CUDA cooperative groups, etc.
For people who have a bunch of numpy code, I would suggest taking a look at cupy.
For people who only want to look at pythonic code, numba is a good choice.
Pycuda is a nice flexibility blend between numba and CUDA C++.
You should be able to achieve any speed in pycuda that you can in “normal” CUDA - it’s only different host code. You’d be writing the same kernel code.
But pyopencl is more actively maintained, and I would highly recommend it. (The actual device-side code for CUDA and OpenCL is identical up to spelling differences.) Both pycuda and pyopencl alleviate a lot of the pain of GPU programming (especially on the host side), being able to integrate with python is great, and the Array classes (numpy array emulator) are wonderful for prototyping/simple operations - so yes, I would say it is highly worth it. Plus, with pyopencl you can conda install pocl and bam, you can run your program on your laptop/any CPU.
Eazzyone and Best Drip Coffee Makers hindiwords