Running the New Toolkit on Python Efficiently

Hello! Hope you’re having a great day, it’s 1AM here :)

I’m currently working on creating a high-performance, optimized library with gradient aggregation rules in our ML application for very large tensors/vectors, which is already speeding up our work quite well. However, some parts of the code is in PyTorch, as running CUDA with a C++ script integrated in python often doesn’t provide the best results, it requires lots of data transfer between the CPU and GPU almost every time.

However, I still want to utilize CUDA more. I’m wondering, with the new Toolkit, is there a way to efficiently run CUDA in python? And if so, how to do that?

Thanks for the answer in advance!
Serhan

1 Like

There are a few frameworks like Numba that can help you accelerate CUDA code onto the GPU, through just-in-time compilation of either C/C++ code (usually provided through a separate file or Python string), or through compilation of Python functions directly with decorators.

This is an area we’re actively investigating, though, so stay tuned for more!

2 Likes

Thanks for the answer!

I’m excited to hear the development on that side, I think it could be really beneficial in the world of AI!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.