CUFFT on MultiGPU + CUDA 4.0

Arip · March 28, 2011, 10:23am

Hi All,
I have a question related to CUDA in general, (but I also want to know if there is any help in CUDA 4.0)

I want to calculate 3D FFT (of very large order) using Multi GPUs(lets say 4 GPUs). So, my Idea is to first create 4 CPU openMP threads, divide and send the data on 4 GPUs, calculate 2D FFTs of the slices, then bring back the data to the CPU, then do transposition, again send the data to 4 GPUs and calculate 1D FFTs and then bring back the data to the CPU and do final transposition.

According to the above plan, on GPUs, I want to fork some threads, that would calculate 2D FFTs and 1D FFTs. But the problem is one cannot call CUFFT inside kernel function( CUFFT functions are callable from the host functions).

So, Any suggestion?
Thanks in advance for the reply.

Regards,
Arip

ChenTao · June 7, 2011, 7:48am

What’s ur “kernel function” for? You can make the call in cpu code.

fizimokus · July 8, 2011, 2:41pm

Hello,

I’ve been working on this topic for an year, I have a working code, but with concurrent copy and execute implementation, which is a little bit tricky.

Your plan is correct, but you don’t have to call cuFFT functions from a kernel itself. As a matter of fact, you don’t have to write any CUDA kernels at all! The cufft functions can be called only from the host machine, by cufftexecXYZ(cufft_plan,…) because this cufft function itself executes a parallel kernel on the device. You have to define only a batched 2D real-to-complex fft plan and a batched 1D complex-to-complex plan by cufftplanmany (for the forward FFT). Copy the slices to the GPUs, call the batched 2D, copy back, rearrange (transpose), copy again to the GPU, do the batched 1D, then copy back to the host again. Note that the x and z directions will be exchanged. That’s all.

Topic		Replies	Views
CUFFT on multiple GPUs CUDA Programming and Performance	6	6296	February 15, 2012
(Q) CUFFT, how to use cufft function.. CUFFT CUDA Programming and Performance	4	2034	August 19, 2009
Calling CuFFT from __global__/__device__ functions! CUDA Programming and Performance	1	3035	December 23, 2009
run cufft in loop is it possible? CUDA Programming and Performance	10	7099	July 6, 2009
compute multiple fft in single gpu. GPU-Accelerated Libraries	1	560	July 29, 2019
How to use cufft ... Have problem calling cufft functions in kernel CUDA Programming and Performance	5	12355	November 5, 2014
CUFFT on multiple cards ? CUDA Programming and Performance	1	2660	April 22, 2010
Cuda FFT with GTX 295: Does the FFT use both of the GPU's? CUDA Programming and Performance	5	4501	August 5, 2009
Streams and CUFFT CUDA Programming and Performance	8	6070	June 16, 2008
replacing FFTW calls CUDA FFT library CUDA Programming and Performance	0	2386	October 25, 2007

CUFFT on MultiGPU + CUDA 4.0

Related topics