NukadaFFT library

Hi all,

I have released the first public version of our FFT library for CUDA GPUs.

http://matsu-www.is.titech.ac.jp/~nukada/nufft/

This thread will be used for feedback.

Thanks,
Akira Nukada

Nice to have another fft library. Are there any parameters where it is faster than cufft? Or is it just add double complex numbers?

Nice to have another fft library. Are there any parameters where it is faster than cufft? Or is it just add double complex numbers?

Nukada-san library is faster than CUFFT, especially when the length of the transform is not a power of two.

This is a link to a poster presented at GTC.

http://www.nvidia.com/content/GTC/posters/…r-CUDA-GPUs.pdf

Nukada-san library is faster than CUFFT, especially when the length of the transform is not a power of two.

This is a link to a poster presented at GTC.

http://www.nvidia.com/content/GTC/posters/…r-CUDA-GPUs.pdf

Dear Nukada-san,

Thanks to you and your colleagues for providing this library.

Do you have any performance numbers (w.r.t. to CUFFT) for larger FFT sizes? On your GTC poster you show only results up to 512 and the CGI script on your HP(http://matsu-www.is.titech.ac.jp/~nukada/nufft/bench.cgi) does not seem to work.

Regards,
Thomas Hobiger

Dear Nukada-san,

Thanks to you and your colleagues for providing this library.

Do you have any performance numbers (w.r.t. to CUFFT) for larger FFT sizes? On your GTC poster you show only results up to 512 and the CGI script on your HP(http://matsu-www.is.titech.ac.jp/~nukada/nufft/bench.cgi) does not seem to work.

Regards,
Thomas Hobiger

Dear Thomas,

I found, the bench service hands up when tried some specific transform size.

Now the service (daemon) will be reset every hour.

I have another version without the problem, however it is still under evaluations

in both CUDA 3.1 and 3.2…

Thanks,

Akira Nukada

Dear Thomas,

I found, the bench service hands up when tried some specific transform size.

Now the service (daemon) will be reset every hour.

I have another version without the problem, however it is still under evaluations

in both CUDA 3.1 and 3.2…

Thanks,

Akira Nukada

@Nukada-san quick question regarding the FFT library, can we use Complex datatypes??

@Nukada-san quick question regarding the FFT library, can we use Complex datatypes??

Although I’m not sure I could understand your question…

The library support only complex data types in single or double precision, i.e. real data type is not supported.

And the complex data array must contain real part and imaginary part in inter-leaved format.

Although I’m not sure I could understand your question…

The library support only complex data types in single or double precision, i.e. real data type is not supported.

And the complex data array must contain real part and imaginary part in inter-leaved format.

going through the runtime.cu example, and trying this gives me errors

typedef float2 Complex;

Complex *in1;

cudaHostAlloc((void **)&in1, sizeof(Complex) * pix1 * pix2 * n, cudaHostAllocMapped);

// ... Do stuff on Host and calculate in1 ... //

Complex *in1_d;

Complex *f1_d;

cudaMalloc((void**) &f1_d, sizeof(Complex) * pix1 * pix2 * n); //n = batchsize

cudaHostGetDevicePointer((void **)&in1_d, (void *)in1, 0);

//FFT calculation

nufft_plan plan_forward1;

nufftPlan2d(&plan_forward1, pix1, pix2, n, in1_d, f1_d, NUFFT_D2D);

nufftExec(plan_forward1, in1_d, f1_d, NUFFT_FORWARD);

nufftDestroy(plan_forward1);

error: too few arguments in function call

error: argument of type “int” is incompatible with parameter of type “void *”

going through the runtime.cu example, and trying this gives me errors

typedef float2 Complex;

Complex *in1;

cudaHostAlloc((void **)&in1, sizeof(Complex) * pix1 * pix2 * n, cudaHostAllocMapped);

// ... Do stuff on Host and calculate in1 ... //

Complex *in1_d;

Complex *f1_d;

cudaMalloc((void**) &f1_d, sizeof(Complex) * pix1 * pix2 * n); //n = batchsize

cudaHostGetDevicePointer((void **)&in1_d, (void *)in1, 0);

//FFT calculation

nufft_plan plan_forward1;

nufftPlan2d(&plan_forward1, pix1, pix2, n, in1_d, f1_d, NUFFT_D2D);

nufftExec(plan_forward1, in1_d, f1_d, NUFFT_FORWARD);

nufftDestroy(plan_forward1);

error: too few arguments in function call

error: argument of type “int” is incompatible with parameter of type “void *”

Try looking at the prototypes in nufft.h. Your nufftExec call has too few arguments…

Try looking at the prototypes in nufft.h. Your nufftExec call has too few arguments…

Please remeber that,

  • nufftPlan2d() destroys the data on the given arrays.

    You need to set data on in1 after the call.

  • You have to specify two additional device memory regions of same size as input data

    for 6th and 7th argument of nufftPlan2d() and also for 3rd and 4th argument of nufftExec().

    second buffer (7th and 3rd for each API) can be same as f1_d.

Please remeber that,

  • nufftPlan2d() destroys the data on the given arrays.

    You need to set data on in1 after the call.

  • You have to specify two additional device memory regions of same size as input data

    for 6th and 7th argument of nufftPlan2d() and also for 3rd and 4th argument of nufftExec().

    second buffer (7th and 3rd for each API) can be same as f1_d.

@avidday & @nukada: thanks for the suggestions.

I have an error regarding

error while loading shared libraries: libnufft.so: cannot open shared object file: No such file or directory

when I have included the include directory and linked lnufft

My executable script looks like this:

nvcc -g -G -pg -D_DEBUG -o ../obj/ao76_fft8_batch50 ../src/ao76_fft8_batch50.cu \

--host-compilation C -arch sm_13 \

--ptxas-options=-v \

-I/usr/local/cuda/include \

-L/usr/local/cuda/lib64 -lcuda -lcudart \

-I/home/vivekv/CUDA_3.1/NukadaFFT-1.0/include \

-L/home/vivekv/CUDA_3.1/NukadaFFT-1.0/lib64 -lnufft \

-I/home/vivekv/NVIDIA_GPU_Computing_SDK/C/common/inc/ \

-L/home/vivekv/NVIDIA_GPU_Computing_SDK/C/lib/ -lcutil_x86_64 \

-I/usr/include/ -L/usr/lib64/ -lm -lfftw3