It it possible to cufftXtMemcpy asychronously on multiple devices?

mab370 · May 17, 2022, 10:57pm

I am using the cuFFT library on several GPUs.
It looks like the copy from host to devices is synchronous, implying a large time on the device to host and host to device copy operations.

Is it possible to run cufftXtMemcpy asychronously on multiple devices ?

mnicely · May 18, 2022, 4:12pm

Not at this time. Could you provide more insight into your use case?

As of CUDA 11.2 (cuFFT 10.4.0), cufftSetStream() is supported in multiple GPU cases. However, calls to cufftXtMemcpy() are still synchronous across multiple GPUs when using streams. In previous versions of cuFFT, cufftSetStream() returns an error in the multiple GPU case. Likewise, calling certain multi-GPU functions such as cufftXtSetCallback() after setting a stream with cufftSetStream() will result in an error (see API functions for more details).

Also, you might trying pinning memory to speed up transfers.

Topic		Replies	Views
asynchronous cuMemcpyDtoD ? CUDA Programming and Performance	9	2409	December 9, 2008
Associating streams with multi GPU cuFFT plan GPU-Accelerated Libraries cufft	2	841	June 11, 2023
concurrent copy and execute with cufft possible? CUDA Programming and Performance	1	1965	April 23, 2010
Why multi-GPU CUFFT uses the default cudaDeviceSynchronize() GPU-Accelerated Libraries cufft	0	602	July 19, 2022
FFT multi-GPU? CUDA Programming and Performance	0	978	February 6, 2011
cuFFT's stream support CUDA Programming and Performance	0	3258	July 29, 2009
cudaMemcpyAsync clarification required & help needed CUDA Programming and Performance	0	1752	October 17, 2009
Asynchronous memory copy from Host to Device CUDA Programming and Performance	5	3060	June 12, 2008
Just to be sure, the only way to split batched 1D R2C/C2R forward/inverse cufft ftt execution between 2 GPUs is via 'cufftXt' library? GPU-Accelerated Libraries	3	789	April 14, 2017
CUFFT on multiple cards ? CUDA Programming and Performance	1	2627	April 22, 2010

It it possible to cufftXtMemcpy asychronously on multiple devices?

Related topics