I’m fairly new to CUDA development, so forgive me if this is an obvious question…
I’ve been looking at the benchmarks for computing FFTs on the CPU (using FFTW) and on the GPU (with CUDA); it seems that for small FFTs, there’s really no advantage to using CUDA over just computing on the CPU (due to transaction overhead and such).
However, I’ve been taking a look at the MusicBrainz Picard program today, which uses the MusicDNS service; if you have a music collection, this program goes through and reads part of each file, computes an FFT, does some linear algebra with it, then computes a unique value for that song that it can look up online to find the information for that song (Artist, Title, Track #, etc.).
I’ve got a moderately fast computer (Core2Duo 2.4Ghz, 4GB RAM, 8800GT), and it’s taking forever to tag my collection, presumably due to the FFT and Linear Algebra parts of the algorithm. The FFT computation uses (I think) 8192 elements, which really doesn’t get much of a speed-up with CUDA; is there a way to compute a bunch of smaller FFTs in parallel using CUDA? If so, maybe this is a neat idea for the next coding challenge!
Description of the Algorithm:
Source Code for libofa, which does the actual computing work for the MusicBrainz software: