Hi all,
I have released the first public version of our FFT library for CUDA GPUs.
[url=“Nukada FFT library”]http://matsu-www.is.titech.ac.jp/~nukada/nufft/[/url]
This thread will be used for feedback.
Thanks,
Akira Nukada
Hi all,
I have released the first public version of our FFT library for CUDA GPUs.
[url=“Nukada FFT library”]http://matsu-www.is.titech.ac.jp/~nukada/nufft/[/url]
This thread will be used for feedback.
Thanks,
Akira Nukada
Nice to have another fft library. Are there any parameters where it is faster than cufft? Or is it just add double complex numbers?
Nice to have another fft library. Are there any parameters where it is faster than cufft? Or is it just add double complex numbers?
Nukada-san library is faster than CUFFT, especially when the length of the transform is not a power of two.
This is a link to a poster presented at GTC.
[url=“http://www.nvidia.com/content/GTC/posters/2010/U04-NukadaFFT-An-Auto-Tuning-FFT-Library-for-CUDA-GPUs.pdf”]http://www.nvidia.com/content/GTC/posters/...r-CUDA-GPUs.pdf[/url]
Nukada-san library is faster than CUFFT, especially when the length of the transform is not a power of two.
This is a link to a poster presented at GTC.
[url=“http://www.nvidia.com/content/GTC/posters/2010/U04-NukadaFFT-An-Auto-Tuning-FFT-Library-for-CUDA-GPUs.pdf”]Page Not Found | NVIDIA
Dear Nukada-san,
Thanks to you and your colleagues for providing this library.
Do you have any performance numbers (w.r.t. to CUFFT) for larger FFT sizes? On your GTC poster you show only results up to 512 and the CGI script on your HP(Benchmark of NukadaFFT library) does not seem to work.
Regards,
Thomas Hobiger
Dear Nukada-san,
Thanks to you and your colleagues for providing this library.
Do you have any performance numbers (w.r.t. to CUFFT) for larger FFT sizes? On your GTC poster you show only results up to 512 and the CGI script on your HP(Benchmark of NukadaFFT library) does not seem to work.
Regards,
Thomas Hobiger
Dear Thomas,
I found, the bench service hands up when tried some specific transform size.
Now the service (daemon) will be reset every hour.
I have another version without the problem, however it is still under evaluations
in both CUDA 3.1 and 3.2…
Thanks,
Akira Nukada
Dear Thomas,
I found, the bench service hands up when tried some specific transform size.
Now the service (daemon) will be reset every hour.
I have another version without the problem, however it is still under evaluations
in both CUDA 3.1 and 3.2…
Thanks,
Akira Nukada
@Nukada-san quick question regarding the FFT library, can we use Complex datatypes??
@Nukada-san quick question regarding the FFT library, can we use Complex datatypes??
Although I’m not sure I could understand your question…
The library support only complex data types in single or double precision, i.e. real data type is not supported.
And the complex data array must contain real part and imaginary part in inter-leaved format.
Although I’m not sure I could understand your question…
The library support only complex data types in single or double precision, i.e. real data type is not supported.
And the complex data array must contain real part and imaginary part in inter-leaved format.
going through the runtime.cu example, and trying this gives me errors
typedef float2 Complex;
Complex *in1;
cudaHostAlloc((void **)&in1, sizeof(Complex) * pix1 * pix2 * n, cudaHostAllocMapped);
// ... Do stuff on Host and calculate in1 ... //
Complex *in1_d;
Complex *f1_d;
cudaMalloc((void**) &f1_d, sizeof(Complex) * pix1 * pix2 * n); //n = batchsize
cudaHostGetDevicePointer((void **)&in1_d, (void *)in1, 0);
//FFT calculation
nufft_plan plan_forward1;
nufftPlan2d(&plan_forward1, pix1, pix2, n, in1_d, f1_d, NUFFT_D2D);
nufftExec(plan_forward1, in1_d, f1_d, NUFFT_FORWARD);
nufftDestroy(plan_forward1);
error: too few arguments in function call
error: argument of type “int” is incompatible with parameter of type “void *”
going through the runtime.cu example, and trying this gives me errors
typedef float2 Complex;
Complex *in1;
cudaHostAlloc((void **)&in1, sizeof(Complex) * pix1 * pix2 * n, cudaHostAllocMapped);
// ... Do stuff on Host and calculate in1 ... //
Complex *in1_d;
Complex *f1_d;
cudaMalloc((void**) &f1_d, sizeof(Complex) * pix1 * pix2 * n); //n = batchsize
cudaHostGetDevicePointer((void **)&in1_d, (void *)in1, 0);
//FFT calculation
nufft_plan plan_forward1;
nufftPlan2d(&plan_forward1, pix1, pix2, n, in1_d, f1_d, NUFFT_D2D);
nufftExec(plan_forward1, in1_d, f1_d, NUFFT_FORWARD);
nufftDestroy(plan_forward1);
error: too few arguments in function call
error: argument of type “int” is incompatible with parameter of type “void *”
Try looking at the prototypes in nufft.h. Your nufftExec call has too few arguments…
Try looking at the prototypes in nufft.h. Your nufftExec call has too few arguments…
Please remeber that,
nufftPlan2d() destroys the data on the given arrays.
You need to set data on in1 after the call.
You have to specify two additional device memory regions of same size as input data
for 6th and 7th argument of nufftPlan2d() and also for 3rd and 4th argument of nufftExec().
second buffer (7th and 3rd for each API) can be same as f1_d.
Please remeber that,
nufftPlan2d() destroys the data on the given arrays.
You need to set data on in1 after the call.
You have to specify two additional device memory regions of same size as input data
for 6th and 7th argument of nufftPlan2d() and also for 3rd and 4th argument of nufftExec().
second buffer (7th and 3rd for each API) can be same as f1_d.
@avidday & @nukada: thanks for the suggestions.
I have an error regarding
error while loading shared libraries: libnufft.so: cannot open shared object file: No such file or directory
when I have included the include directory and linked lnufft
My executable script looks like this:
nvcc -g -G -pg -D_DEBUG -o ../obj/ao76_fft8_batch50 ../src/ao76_fft8_batch50.cu \
--host-compilation C -arch sm_13 \
--ptxas-options=-v \
-I/usr/local/cuda/include \
-L/usr/local/cuda/lib64 -lcuda -lcudart \
-I/home/vivekv/CUDA_3.1/NukadaFFT-1.0/include \
-L/home/vivekv/CUDA_3.1/NukadaFFT-1.0/lib64 -lnufft \
-I/home/vivekv/NVIDIA_GPU_Computing_SDK/C/common/inc/ \
-L/home/vivekv/NVIDIA_GPU_Computing_SDK/C/lib/ -lcutil_x86_64 \
-I/usr/include/ -L/usr/lib64/ -lm -lfftw3