Copy Computing Overlap in Python

Hi all, I’m working on a problem involving the Longstaff Schwart algo for a gas storage. I need to work on many big matrices, and I was hoping to use copy computing overlap to speed-up the process. I was hoping to use python, but I have found no documentation about CC Overlap. Do you know if it is possible to use this technique using numba or pycuda -or any other python library? -

Thank you for your help.

-if this is not the most relevant section, please move it to the most appropriate one -

its possible. for pycuda, a number of the basic concepts (streams, async copy operations, etc.) are illustrated here. I’m sure there are other examples. For numba CUDA this seems to be a reasonable write-up. For general treatment of CUDA concurrency, which covers all the necessary elements (streams, pinned host data, async operations) see here, section 7.

Hi Robert, thank you very much for your reply. The references you posted seem to be very promising.
Thank you for your attention.