memory allocation jumps in cufftplan3d sudden increase in GPU memory allocation

Hi,

we have written a program that uses cuFFT for 3D transforms. What we noticed is that cufftplan3d actually allocates A LOT of memory once the size of the requested transform exceeds a certain threshold. This caused our program to crash on Win7 (as that standard uses significantly more GPU memory than XP). We have not observed this in Cuda 2.1 but do see this in Cuda 3.0.
Basically we first allocate input & output arrays, then do a loop

for (int xx = 60; xx <= 512; xx++) {
cufftplan3d ( plan, z, xx , xx, r2c);
cuexec(…)
cudestroy(…)
syncthreads
}

What we observe (if we take z = 120) is that the GPU memory consumption is constant for 60 <= xx < 384 and when x == 385 suddenly it allocated another 270 Mb (!) of memory when calling cufftplan3d, causing a out of memory crash or other weird things on out 768Mb GTX8800.

Could anyone tell me what is going on?

kind regards
Remco

Could somebody from NVIDIA please reply?

Could somebody from NVIDIA please reply?

This issue still remains… is there really no-one who has the same issue (or even better: a solution??). ANY feedback is appreciated.

This issue still remains… is there really no-one who has the same issue (or even better: a solution??). ANY feedback is appreciated.

Hi RemcoS,

Do you see this issue with CUFFT 3.1 or 3.2RC as well?

Note that it’s normal for CUFFT to need ~4X the transform size in scratch space, if I recall correctly.

Also, how are you querying the available/used memory? The CUDA driver does some internal sub-allocations for smaller sizes (allocating a big block and then dividing it up), so perhaps that’s why your smaller sizes appear to require approximately constant memory.

Thanks,

Cliff

Hi RemcoS,

Do you see this issue with CUFFT 3.1 or 3.2RC as well?

Note that it’s normal for CUFFT to need ~4X the transform size in scratch space, if I recall correctly.

Also, how are you querying the available/used memory? The CUDA driver does some internal sub-allocations for smaller sizes (allocating a big block and then dividing it up), so perhaps that’s why your smaller sizes appear to require approximately constant memory.

Thanks,

Cliff

Hi Cliff,

thank you for your response! I measure the GPU usage using GPU-Z, simply observing the memory allocation there. I also see this in 3.1, and 3.2RC actually causes crashes and funny behavior on my system (driver issue?), so I decided to not use that for the time being. The issue is that especially under Windows 7, which ‘randomly’ consumes memory for itself, it becomes tricky to predict the memory usage of a certain transform, so unexpected out-of-memory errors are occurring, in fact crashing the app.

I get from your response that this is an unknown problem yet?

The basic code is the following (ignore the macros):

int FFTTest(int DimXYStart, int DimXYEnd, int DimZ, int NumTests, float* output, int GPUId)

{

int idx = 0;

for (int xy = DimXYStart; xy <= DimXYEnd; xy++)

{		

	for (int i = 0; i < NumTests; ++i)

	{

		TM_CUFFT_CALL(cufftPlan3d(&m_planR2C[GPUId], DimZ, xy, xy, CUFFT_R2C));	

		// Check for any CUDA errors

		checkCUDAError("PerformFFTMeasure1_cufftPlan3d");

		

		TM_CUFFT_CALL(cufftExecR2C(m_planR2C[GPUId], (cufftReal *)m_pdMeasureVol[GPUId], (cufftComplex *)m_pdFFTMeasureVol[GPUId]));

		// Check for any CUDA errors

		checkCUDAError("PerformFFTMeasure1_cufftExecR2C");

		

		TM_CUFFT_CALL(cufftDestroy(m_planR2C[GPUId]));			

		// Check for any CUDA errors

		checkCUDAError("PerformFFTMeasure1_cufftDestroy");

		cudaThreadSynchronize();

	}

	cudaThreadSynchronize();

}

return 1;

}

Hi Cliff,

thank you for your response! I measure the GPU usage using GPU-Z, simply observing the memory allocation there. I also see this in 3.1, and 3.2RC actually causes crashes and funny behavior on my system (driver issue?), so I decided to not use that for the time being. The issue is that especially under Windows 7, which ‘randomly’ consumes memory for itself, it becomes tricky to predict the memory usage of a certain transform, so unexpected out-of-memory errors are occurring, in fact crashing the app.

I get from your response that this is an unknown problem yet?

The basic code is the following (ignore the macros):

int FFTTest(int DimXYStart, int DimXYEnd, int DimZ, int NumTests, float* output, int GPUId)

{

int idx = 0;

for (int xy = DimXYStart; xy <= DimXYEnd; xy++)

{		

	for (int i = 0; i < NumTests; ++i)

	{

		TM_CUFFT_CALL(cufftPlan3d(&m_planR2C[GPUId], DimZ, xy, xy, CUFFT_R2C));	

		// Check for any CUDA errors

		checkCUDAError("PerformFFTMeasure1_cufftPlan3d");

		

		TM_CUFFT_CALL(cufftExecR2C(m_planR2C[GPUId], (cufftReal *)m_pdMeasureVol[GPUId], (cufftComplex *)m_pdFFTMeasureVol[GPUId]));

		// Check for any CUDA errors

		checkCUDAError("PerformFFTMeasure1_cufftExecR2C");

		

		TM_CUFFT_CALL(cufftDestroy(m_planR2C[GPUId]));			

		// Check for any CUDA errors

		checkCUDAError("PerformFFTMeasure1_cufftDestroy");

		cudaThreadSynchronize();

	}

	cudaThreadSynchronize();

}

return 1;

}

There were known issues with CUFFT 3.1 where out-of-memory conditions could cause an application crash. But CUFFT 3.2 should be much better at detecting out-of-memory conditions and returning error codes the way it ought to. So I’m a bit surprised that you’re seeing bad behavior in out-of-memory conditions in 3.2 that you did not see in 3.1.

Does the TM_CUFFT_CALL() macro you’re using call exit() if an error code is returned from cufftExec? If so, is this the “crash” you’re describing? If not, can you give me more information about what the symptoms of this application crash are? Any error messages returned at all?

Thanks,

Cliff

There were known issues with CUFFT 3.1 where out-of-memory conditions could cause an application crash. But CUFFT 3.2 should be much better at detecting out-of-memory conditions and returning error codes the way it ought to. So I’m a bit surprised that you’re seeing bad behavior in out-of-memory conditions in 3.2 that you did not see in 3.1.

Does the TM_CUFFT_CALL() macro you’re using call exit() if an error code is returned from cufftExec? If so, is this the “crash” you’re describing? If not, can you give me more information about what the symptoms of this application crash are? Any error messages returned at all?

Thanks,

Cliff

Thanks for the reply Cliff, sorry it took so long from my side (out-of-office). The TM_CUFFT_CALL indeed does an exit, but not after writing its status to a log file and giving a message box. I do sometimes observe this and it is not what I call a crash, since it is a ‘detected and caught’ application problem. If I say ‘crash’ it means a complete and sudden disappearance of the entire app without leaving a trace…It’s also not so much the actual crash I am worried about. It’s the fact that these sudden memory consumptions occur at all, leading up to this unstable behaviour. I am at the moment allocated at a different project, so I cannot dive too deep into it at this point in time. Also I will wait until 3.2 is officially released and then try again. Hopefully, by that time the issue is either resolved due to 3.2 memory improvements. If not, I will at least be in a better position to communicate with you further. If you like, we could also directly communicate (outside of the Forum), might be handier.

kind regards

Remco

Thanks for the reply Cliff, sorry it took so long from my side (out-of-office). The TM_CUFFT_CALL indeed does an exit, but not after writing its status to a log file and giving a message box. I do sometimes observe this and it is not what I call a crash, since it is a ‘detected and caught’ application problem. If I say ‘crash’ it means a complete and sudden disappearance of the entire app without leaving a trace…It’s also not so much the actual crash I am worried about. It’s the fact that these sudden memory consumptions occur at all, leading up to this unstable behaviour. I am at the moment allocated at a different project, so I cannot dive too deep into it at this point in time. Also I will wait until 3.2 is officially released and then try again. Hopefully, by that time the issue is either resolved due to 3.2 memory improvements. If not, I will at least be in a better position to communicate with you further. If you like, we could also directly communicate (outside of the Forum), might be handier.

kind regards

Remco