Running the New Toolkit on Python Efficiently

Hello! Hope you’re having a great day, it’s 1AM here :)

I’m currently working on creating a high-performance, optimized library with gradient aggregation rules in our ML application for very large tensors/vectors, which is already speeding up our work quite well. However, some parts of the code is in PyTorch, as running CUDA with a C++ script integrated in python often doesn’t provide the best results, it requires lots of data transfer between the CPU and GPU almost every time.

However, I still want to utilize CUDA more. I’m wondering, with the new Toolkit, is there a way to efficiently run CUDA in python? And if so, how to do that?

Thanks for the answer in advance!

1 Like

There are a few frameworks like Numba that can help you accelerate CUDA code onto the GPU, through just-in-time compilation of either C/C++ code (usually provided through a separate file or Python string), or through compilation of Python functions directly with decorators.

This is an area we’re actively investigating, though, so stay tuned for more!


Thanks for the answer!

I’m excited to hear the development on that side, I think it could be really beneficial in the world of AI!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.