Why multi-GPU CUFFT uses the default cudaDeviceSynchronize()

llodds · July 19, 2022, 10:12pm

Hi, is there any function call to use certain forms of stream synchronize rather than automatic device synchronize in multi-GPU cuFFT. I want to overlap cuFFT computation with asynchronous H2D/D2H memory copies. Thanks.