NukadaFFT library

Akira_Nukada · September 28, 2010, 3:27pm

Hi all,

I have released the first public version of our FFT library for CUDA GPUs.

[url=“Nukada FFT library”]http://matsu-www.is.titech.ac.jp/~nukada/nufft/[/url]

This thread will be used for feedback.

Thanks,
Akira Nukada

Lev · September 29, 2010, 3:15pm

Nice to have another fft library. Are there any parameters where it is faster than cufft? Or is it just add double complex numbers?

Lev · September 29, 2010, 3:15pm

Nice to have another fft library. Are there any parameters where it is faster than cufft? Or is it just add double complex numbers?

mfatica · September 29, 2010, 3:39pm

Nukada-san library is faster than CUFFT, especially when the length of the transform is not a power of two.

This is a link to a poster presented at GTC.

[url=“http://www.nvidia.com/content/GTC/posters/2010/U04-NukadaFFT-An-Auto-Tuning-FFT-Library-for-CUDA-GPUs.pdf”]http://www.nvidia.com/content/GTC/posters/...r-CUDA-GPUs.pdf[/url]

mfatica · September 29, 2010, 3:39pm

Nukada-san library is faster than CUFFT, especially when the length of the transform is not a power of two.

This is a link to a poster presented at GTC.

[url=“http://www.nvidia.com/content/GTC/posters/2010/U04-NukadaFFT-An-Auto-Tuning-FFT-Library-for-CUDA-GPUs.pdf”]Page Not Found | NVIDIA

thobiger · September 30, 2010, 5:59am

Dear Nukada-san,

Thanks to you and your colleagues for providing this library.

Do you have any performance numbers (w.r.t. to CUFFT) for larger FFT sizes? On your GTC poster you show only results up to 512 and the CGI script on your HP(Benchmark of NukadaFFT library) does not seem to work.

Regards,
Thomas Hobiger

thobiger · September 30, 2010, 5:59am

Dear Nukada-san,

Thanks to you and your colleagues for providing this library.

Do you have any performance numbers (w.r.t. to CUFFT) for larger FFT sizes? On your GTC poster you show only results up to 512 and the CGI script on your HP(Benchmark of NukadaFFT library) does not seem to work.

Regards,
Thomas Hobiger

Akira_Nukada · September 30, 2010, 9:54am

Dear Thomas,

I found, the bench service hands up when tried some specific transform size.

Now the service (daemon) will be reset every hour.

I have another version without the problem, however it is still under evaluations

in both CUDA 3.1 and 3.2…

Thanks,

Akira Nukada

Akira_Nukada · September 30, 2010, 9:54am

Dear Thomas,

I found, the bench service hands up when tried some specific transform size.

Now the service (daemon) will be reset every hour.

I have another version without the problem, however it is still under evaluations

in both CUDA 3.1 and 3.2…

Thanks,

Akira Nukada

vivekv80 · September 30, 2010, 3:10pm

@Nukada-san quick question regarding the FFT library, can we use Complex datatypes??

vivekv80 · September 30, 2010, 3:10pm

@Nukada-san quick question regarding the FFT library, can we use Complex datatypes??

Akira_Nukada · September 30, 2010, 3:26pm

Although I’m not sure I could understand your question…

The library support only complex data types in single or double precision, i.e. real data type is not supported.

And the complex data array must contain real part and imaginary part in inter-leaved format.

Akira_Nukada · September 30, 2010, 3:26pm

Although I’m not sure I could understand your question…

The library support only complex data types in single or double precision, i.e. real data type is not supported.

And the complex data array must contain real part and imaginary part in inter-leaved format.

vivekv80 · September 30, 2010, 3:34pm

going through the runtime.cu example, and trying this gives me errors

typedef float2 Complex;

Complex *in1;

cudaHostAlloc((void **)&in1, sizeof(Complex) * pix1 * pix2 * n, cudaHostAllocMapped);

// ... Do stuff on Host and calculate in1 ... //

Complex *in1_d;

Complex *f1_d;

cudaMalloc((void**) &f1_d, sizeof(Complex) * pix1 * pix2 * n); //n = batchsize

cudaHostGetDevicePointer((void **)&in1_d, (void *)in1, 0);

//FFT calculation

nufft_plan plan_forward1;

nufftPlan2d(&plan_forward1, pix1, pix2, n, in1_d, f1_d, NUFFT_D2D);

nufftExec(plan_forward1, in1_d, f1_d, NUFFT_FORWARD);

nufftDestroy(plan_forward1);

error: too few arguments in function call

error: argument of type “int” is incompatible with parameter of type “void *”

vivekv80 · September 30, 2010, 3:34pm

going through the runtime.cu example, and trying this gives me errors

typedef float2 Complex;

Complex *in1;

cudaHostAlloc((void **)&in1, sizeof(Complex) * pix1 * pix2 * n, cudaHostAllocMapped);

// ... Do stuff on Host and calculate in1 ... //

Complex *in1_d;

Complex *f1_d;

cudaMalloc((void**) &f1_d, sizeof(Complex) * pix1 * pix2 * n); //n = batchsize

cudaHostGetDevicePointer((void **)&in1_d, (void *)in1, 0);

//FFT calculation

nufft_plan plan_forward1;

nufftPlan2d(&plan_forward1, pix1, pix2, n, in1_d, f1_d, NUFFT_D2D);

nufftExec(plan_forward1, in1_d, f1_d, NUFFT_FORWARD);

nufftDestroy(plan_forward1);

error: too few arguments in function call

error: argument of type “int” is incompatible with parameter of type “void *”

avidday · September 30, 2010, 5:04pm

Try looking at the prototypes in nufft.h. Your nufftExec call has too few arguments…

avidday · September 30, 2010, 5:04pm

Try looking at the prototypes in nufft.h. Your nufftExec call has too few arguments…

Akira_Nukada · September 30, 2010, 5:26pm

Please remeber that,

nufftPlan2d() destroys the data on the given arrays.

You need to set data on in1 after the call.
You have to specify two additional device memory regions of same size as input data

for 6th and 7th argument of nufftPlan2d() and also for 3rd and 4th argument of nufftExec().

second buffer (7th and 3rd for each API) can be same as f1_d.

going through the runtime.cu example, and trying this gives me errors

typedef float2 Complex;

Complex *in1;

cudaHostAlloc((void **)&in1, sizeof(Complex) * pix1 * pix2 * n, cudaHostAllocMapped);

// ... Do stuff on Host and calculate in1 ... //

Complex *in1_d;

Complex *f1_d;

cudaMalloc((void**) &f1_d, sizeof(Complex) * pix1 * pix2 * n); //n = batchsize

cudaHostGetDevicePointer((void **)&in1_d, (void *)in1, 0);

//FFT calculation

nufft_plan plan_forward1;

nufftPlan2d(&plan_forward1, pix1, pix2, n, in1_d, f1_d, NUFFT_D2D);

nufftExec(plan_forward1, in1_d, f1_d, NUFFT_FORWARD);

nufftDestroy(plan_forward1);

error: too few arguments in function call

error: argument of type “int” is incompatible with parameter of type “void *”

Akira_Nukada · September 30, 2010, 5:26pm

Please remeber that,

nufftPlan2d() destroys the data on the given arrays.

You need to set data on in1 after the call.
You have to specify two additional device memory regions of same size as input data

for 6th and 7th argument of nufftPlan2d() and also for 3rd and 4th argument of nufftExec().

second buffer (7th and 3rd for each API) can be same as f1_d.

going through the runtime.cu example, and trying this gives me errors

typedef float2 Complex;

Complex *in1;

cudaHostAlloc((void **)&in1, sizeof(Complex) * pix1 * pix2 * n, cudaHostAllocMapped);

// ... Do stuff on Host and calculate in1 ... //

Complex *in1_d;

Complex *f1_d;

cudaMalloc((void**) &f1_d, sizeof(Complex) * pix1 * pix2 * n); //n = batchsize

cudaHostGetDevicePointer((void **)&in1_d, (void *)in1, 0);

//FFT calculation

nufft_plan plan_forward1;

nufftPlan2d(&plan_forward1, pix1, pix2, n, in1_d, f1_d, NUFFT_D2D);

nufftExec(plan_forward1, in1_d, f1_d, NUFFT_FORWARD);

nufftDestroy(plan_forward1);

error: too few arguments in function call

error: argument of type “int” is incompatible with parameter of type “void *”

vivekv80 · September 30, 2010, 7:32pm

@avidday & @nukada: thanks for the suggestions.

I have an error regarding

error while loading shared libraries: libnufft.so: cannot open shared object file: No such file or directory

when I have included the include directory and linked lnufft

My executable script looks like this:

nvcc -g -G -pg -D_DEBUG -o ../obj/ao76_fft8_batch50 ../src/ao76_fft8_batch50.cu \

--host-compilation C -arch sm_13 \

--ptxas-options=-v \

-I/usr/local/cuda/include \

-L/usr/local/cuda/lib64 -lcuda -lcudart \

-I/home/vivekv/CUDA_3.1/NukadaFFT-1.0/include \

-L/home/vivekv/CUDA_3.1/NukadaFFT-1.0/lib64 -lnufft \

-I/home/vivekv/NVIDIA_GPU_Computing_SDK/C/common/inc/ \

-L/home/vivekv/NVIDIA_GPU_Computing_SDK/C/lib/ -lcutil_x86_64 \

-I/usr/include/ -L/usr/lib64/ -lm -lfftw3