Although I have asked this question here already once I did not get any real reply on that issue.
How about large FFTs?
Are FFTs with more than 10K points supported? Since we don’t need so many batches (roughly 40 at max) it would be good to know if we can consider this library for our high spectral resolution application or not.
Regards,
Thomas Hobiger
I have completed the installation of Visual Studio 2008.
If no code modification for Win32 is required, the time will be very soon.
I have completed the installation of Visual Studio 2008.
If no code modification for Win32 is required, the time will be very soon.
Current version do not support more than 12K transform size (in single precision).
This is limited by the shared memory size 48KB of Fermi.
Even if it is smaller than 12K, there is still limitation of the number of registers per thread, and maximum number of threads per block.
I may prepare another API for large 1-D FFT, however it requires autotuning for transpose.
It is difficult especially when the transform size is not multiples of 64.
Although I have asked this question here already once I did not get any real reply on that issue.
How about large FFTs?
Are FFTs with more than 10K points supported? Since we don’t need so many batches (roughly 40 at max) it would be good to know if we can consider this library for our high spectral resolution application or not.
Regards,
Thomas Hobiger
Current version do not support more than 12K transform size (in single precision).
This is limited by the shared memory size 48KB of Fermi.
Even if it is smaller than 12K, there is still limitation of the number of registers per thread, and maximum number of threads per block.
I may prepare another API for large 1-D FFT, however it requires autotuning for transpose.
It is difficult especially when the transform size is not multiples of 64.
Although I have asked this question here already once I did not get any real reply on that issue.
How about large FFTs?
Are FFTs with more than 10K points supported? Since we don’t need so many batches (roughly 40 at max) it would be good to know if we can consider this library for our high spectral resolution application or not.
Regards,
Thomas Hobiger
I have completed the installation of Visual Studio 2008.
If no code modification for Win32 is required, the time will be very soon.
Will it support 64Bit host code?
thanks
eyal
I have completed the installation of Visual Studio 2008.
If no code modification for Win32 is required, the time will be very soon.
Will it support 64Bit host code?
thanks
eyal
Thanks a lot for your reply.
Do you have any idea, how such large FFTs would perform w.r.t. NVIDIA’s CUFFT? Are you expecting them to be much faster?
As for the mentioned API, do you have some concrete plans for implementing the larger FFTs or is that something which is not in the upper part of your priority list?
Regards,
Thomas
Current version do not support more than 12K transform size (in single precision).
This is limited by the shared memory size 48KB of Fermi.
Even if it is smaller than 12K, there is still limitation of the number of registers per thread, and maximum number of threads per block.
I may prepare another API for large 1-D FFT, however it requires autotuning for transpose.
It is difficult especially when the transform size is not multiples of 64.
Thanks a lot for your reply.
Do you have any idea, how such large FFTs would perform w.r.t. NVIDIA’s CUFFT? Are you expecting them to be much faster?
As for the mentioned API, do you have some concrete plans for implementing the larger FFTs or is that something which is not in the upper part of your priority list?
Regards,
Thomas
Current version do not support more than 12K transform size (in single precision).
This is limited by the shared memory size 48KB of Fermi.
Even if it is smaller than 12K, there is still limitation of the number of registers per thread, and maximum number of threads per block.
I may prepare another API for large 1-D FFT, however it requires autotuning for transpose.
It is difficult especially when the transform size is not multiples of 64.
I don’t have any novel idea for larger FFTs. Although the performance depends on the transform size,
it is difficult to be much faster than CUFFT because the computation is memory bound.
For this reason, currently I recommend the CUFFT library for larger FFTs.
Thanks a lot for your reply.
Do you have any idea, how such large FFTs would perform w.r.t. NVIDIA’s CUFFT? Are you expecting them to be much faster?
As for the mentioned API, do you have some concrete plans for implementing the larger FFTs or is that something which is not in the upper part of your priority list?
Regards,
Thomas
I don’t have any novel idea for larger FFTs. Although the performance depends on the transform size,
it is difficult to be much faster than CUFFT because the computation is memory bound.
For this reason, currently I recommend the CUFFT library for larger FFTs.
Thanks a lot for your reply.
Do you have any idea, how such large FFTs would perform w.r.t. NVIDIA’s CUFFT? Are you expecting them to be much faster?
As for the mentioned API, do you have some concrete plans for implementing the larger FFTs or is that something which is not in the upper part of your priority list?
Regards,
Thomas
oscarb
November 23, 2010, 2:29pm
74
It’s ready now.
I hope it works.
Hi,
I’m happy with binaries for Windows…
great job!
Are you still considering recompiling (or trying to find a Mac) the library for Macos now we have Quadro Fermi 4000?
That would great be for me…
Thanks
oscarb
November 23, 2010, 2:29pm
75
It’s ready now.
I hope it works.
Hi,
I’m happy with binaries for Windows…
great job!
Are you still considering recompiling (or trying to find a Mac) the library for Macos now we have Quadro Fermi 4000?
That would great be for me…
Thanks
Sorry for waiting.
Now I got a MacBook Air and compiled the library.
This is the first time for me to use MacOS X…
The library is a universal binary that consists of i386 and x86_64 versions,
like libcuda.dylib, and so on. When compiling that, I specified “-install_name @rpath /libnufft.dylib”.
You may require -Wl,-rpath,path_to_the_library option when linking your application.
I also prepared a library for CentOS 4.8.
I hope it also works on RedHat Enterprise Linux 4.8 requested by someone.
This version is not tested.
Hi,
I’m happy with binaries for Windows…
great job!
Are you still considering recompiling (or trying to find a Mac) the library for Macos now we have Quadro Fermi 4000?
That would great be for me…
Thanks
oscarb
December 12, 2010, 4:27am
77
Hi I can’t find macos version on the site…
Sorry for waiting.
Now I got a MacBook Air and compiled the library.
This is the first time for me to use MacOS X…
The library is a universal binary that consists of i386 and x86_64 versions,
like libcuda.dylib, and so on. When compiling that, I specified “-install_name @rpath /libnufft.dylib”.
You may require -Wl,-rpath,path_to_the_library option when linking your application.
I also prepared a library for CentOS 4.8.
I hope it also works on RedHat Enterprise Linux 4.8 requested by someone.
This version is not tested.
At the top page, please select one of the two download links.
Both of them lead to the same download pages, and you can select download files
including macos.
Hello,
I’m trying to implement this library as part of a school project. I tried running the sample code that comes with the library off of this site:
http://matsu-www.is.titech.ac.jp/~nukada/nufft/
Specifically, runtime.cu, but I get an error when it tries to call nufftPlan1d saying it can’t allocate ANY amount of memory (even as small as 16 bytes).
I’ve run this on a Quadro NVS M and a Tesla M1060.
Again, this is the sample runtime code run unmodified from the version posted on the site above. Is there something else besides CUDA itself, which I do have running, that I need to install for NUKADA?