I have a problem where in I am trying to do FFT using the cuFFT library. I am using Tesla k20 and k20x cards. I am using cuda 5.0 and I have come across this sentence in cuda 5.0 “Transform sizes up to 64 million elements in single precision”, but my matrix size is 512x512x256 which results in approx 67 million. Is that a problem or no ? if yes, the library should return an error isnt it ? The other factors may include is I use most of the GPU memory i.e. like out of 5GB only 300-400MB is left in k20c card after allocation to all the matrices(including the one mentioned above and plan creation)… is that a problem ? But in k20x card I have like more than 700MB memory, but here also the same behaviour. The software works fine for 512x256x256 and 256x256x256. Any pointers will be helpful.
If you have the cards why don’t you give it a try? I tied on m2075 tesla card 420x420x420, this practically fills the ram, while on titan I used 12000x12000 and there were no problems.
When you do cufft you have the memory occupied by your matrix + 2.6 the size of your problem for temporary arrays (allocated when you make the plans).
Yes I am trying, but i am getting wrong results hence I asked the question. But what I observed is by querying to the gpu about remaining memory is : After the plan creation it allocates (1/2) size of my matrix ie., in my case if we consider 512x512x256, it allocates around 128MB-129MB (considering padding). So what are you talking about 2.6 times ? I am not sure.
420x420x420 fills the m2075 (6 GB)? the whole taken by cufft library ?
My program uses inplace transforms and it needs 2 plans. I am checking the memory usage before and after plan formation and it about 1GB per plan for a 1.2 GB matrix size. I also have a few additional arrays. I ran programs many times with the ram almost full and I did not get any problems.