NukadaFFT library

Although I have asked this question here already once I did not get any real reply on that issue.

How about large FFTs?

Are FFTs with more than 10K points supported? Since we don’t need so many batches (roughly 40 at max) it would be good to know if we can consider this library for our high spectral resolution application or not.

Regards,
Thomas Hobiger

I have completed the installation of Visual Studio 2008.

If no code modification for Win32 is required, the time will be very soon.

I have completed the installation of Visual Studio 2008.

If no code modification for Win32 is required, the time will be very soon.

Current version do not support more than 12K transform size (in single precision).

This is limited by the shared memory size 48KB of Fermi.

Even if it is smaller than 12K, there is still limitation of the number of registers per thread, and maximum number of threads per block.

I may prepare another API for large 1-D FFT, however it requires autotuning for transpose.

It is difficult especially when the transform size is not multiples of 64.

Current version do not support more than 12K transform size (in single precision).

This is limited by the shared memory size 48KB of Fermi.

Even if it is smaller than 12K, there is still limitation of the number of registers per thread, and maximum number of threads per block.

I may prepare another API for large 1-D FFT, however it requires autotuning for transpose.

It is difficult especially when the transform size is not multiples of 64.

Will it support 64Bit host code?

thanks

eyal

Will it support 64Bit host code?

thanks

eyal

Thanks a lot for your reply.

Do you have any idea, how such large FFTs would perform w.r.t. NVIDIA’s CUFFT? Are you expecting them to be much faster?

As for the mentioned API, do you have some concrete plans for implementing the larger FFTs or is that something which is not in the upper part of your priority list?

Regards,

Thomas

Thanks a lot for your reply.

Do you have any idea, how such large FFTs would perform w.r.t. NVIDIA’s CUFFT? Are you expecting them to be much faster?

As for the mentioned API, do you have some concrete plans for implementing the larger FFTs or is that something which is not in the upper part of your priority list?

Regards,

Thomas

I don’t have any novel idea for larger FFTs. Although the performance depends on the transform size,

it is difficult to be much faster than CUFFT because the computation is memory bound.

For this reason, currently I recommend the CUFFT library for larger FFTs.

I don’t have any novel idea for larger FFTs. Although the performance depends on the transform size,

it is difficult to be much faster than CUFFT because the computation is memory bound.

For this reason, currently I recommend the CUFFT library for larger FFTs.

It’s ready now.

I hope it works.

It’s ready now.

I hope it works.

Hi,

I’m happy with binaries for Windows…

great job!

Are you still considering recompiling (or trying to find a Mac) the library for Macos now we have Quadro Fermi 4000?

That would great be for me…

Thanks

Hi,

I’m happy with binaries for Windows…

great job!

Are you still considering recompiling (or trying to find a Mac) the library for Macos now we have Quadro Fermi 4000?

That would great be for me…

Thanks

Sorry for waiting.

Now I got a MacBook Air and compiled the library.

This is the first time for me to use MacOS X…

The library is a universal binary that consists of i386 and x86_64 versions,

like libcuda.dylib, and so on. When compiling that, I specified “-install_name @rpath/libnufft.dylib”.

You may require -Wl,-rpath,path_to_the_library option when linking your application.

I also prepared a library for CentOS 4.8.

I hope it also works on RedHat Enterprise Linux 4.8 requested by someone.

This version is not tested.

Hi I can’t find macos version on the site…

At the top page, please select one of the two download links.

Both of them lead to the same download pages, and you can select download files

including macos.

thanks!

Hello,

I’m trying to implement this library as part of a school project. I tried running the sample code that comes with the library off of this site:

http://matsu-www.is.titech.ac.jp/~nukada/nufft/

Specifically, runtime.cu, but I get an error when it tries to call nufftPlan1d saying it can’t allocate ANY amount of memory (even as small as 16 bytes).

I’ve run this on a Quadro NVS M and a Tesla M1060.

Again, this is the sample runtime code run unmodified from the version posted on the site above. Is there something else besides CUDA itself, which I do have running, that I need to install for NUKADA?