PuCuda Pros and Cons


Could someone please share his experience with PyCuda (good and bad), I have see a few post where people are asking for someone to translate their PyCuda code into Cuda-C which brings the questions. What are PyCuda limitations as oppose to be using CUDA-C with Thrust. I can program in both Python and C so I am wondering what would be the best choice.


My experience with PyCUDA has been very good. I gave a talk on this at GTC2013:

(Slides 23 and 24 are probably the only relevant things.)

Other details that didn’t make it into that talk:

  • You should only use PyCUDA if you care about making your host code shorter, easier to read, and faster to write. PyCUDA device code is still written in CUDA C, and PyCUDA host code will be slower than the equivalent C host code. If most of your compute time is spent in device code, the extra overhead will not be a problem.

  • CUDA errors are automatically translated into Python exceptions, which makes your Python code much cleaner than the C equivalent.

  • PyCUDA has pretty comprehensive coverage of the CUDA API, so I never found myself unable to use a particular CUDA feature I needed. (This is in contrast to things like NumbaPro, which is still very young and hides many parts of CUDA.)

  • PyCUDA’s gpuarray class makes it very convenient to allocate and transfer data to and from the GPU. I find it easiest to use PyCUDA if I can always work with 1D arrays of simple types.

  • I have never tried making an array of structs, so I have no idea how that would work. I have made a struct of arrays (the better layout for most CUDA kernels) and it is very ugly looking in PyCUDA. I often wish for a better solution here.

  • The gpuarray is not a complete copy of the numpy ndarray. In fact, it is missing nearly all of the more sophisticated algorithms from numpy.

  • Thrust is good if you want to use its collection of parallel algorithms. I am not aware of any easy way to use Thrust from PyCUDA.