advice needed by a PhD student

I agree, but it is hard to make such a program without learning cuda,there could be a lot difficult moments.

I am one of the lucky ones. My problems relies heavy on FFT I ported very easy my codes from FORTRAN to CUDA and I got 50-80 times spped-up comapred to single core cpu codes.

Do you use nvidia fft?

Yes. I use cufft, real to complex and back inplace transforms. I only spend 2 weeks to port my code from fortran. for the cpu version I used fftw.

Hi, pasoleatis,

how did you get such a speed-up? I compared CUFFT (CUDA 4.0, Tesla C2050) and MKL libraries for 2D Real to Complex FFT, and got max 15 times speed up. (I measeured only the time needed to compute the FFT without data transfer)

For which sizes did you get 50-80 times speed-up?

thanks, esem

Hello,

I do not have access to MKL libraries. I compared with a single core runs on clusters composed of AMD processors. It depends on what you compare with. I compare it to my usual run.

Can you please give the timing you get on GPU for some FFTs? For example, how long does it take to compute FFT for 256 x 256 matrix Real to Complex on both CPU and GPU? 512 x 512?

It is very important for me to understand whether I achieved the peak performance.

Regards, esem