I have an issue when I run a little test about “cudaMalloc” and “cudaFree” on TX2.
Each time I apply for memory on GPU, and it returns “success”. Then, I “cudaFree” it, and its value is “success” also. But, when I use “tegrastats” to see the GPU usage, I found that the usage didn’t reduce. To be more straightforward, suppose I apply for 1GB memory by “cudaMalloc”, and the RAM usage is 1GB; then after my “cudaFree” success, the RAM usage is still 1GB, which is unreasonable I think.
Even worse, if I repeat these two steps: cudaMalloc and then cudaFree memory(1GB), for 7 times. then ,the program is KILLED.
By the way, when I apply for memory on CPU by “malloc” and “free”, it’s all correct.
So, my question is why I can’t cudaFree the memory on GPU really ?
global void Datanumber(int num)
{
int tid = threadIdx.x;
int bid = blockIdx.x;
for ( int i = bidblockDim.x + tid ; i<DATASIZE ; i+=blocks * threads)
{
num[i] = i ;
}
}
I am sure that it cudaFree successfully, because if I add a kernel after it, I found “gpudata” can not be used anymore.But the GPU usage didn’t reduce, and the free area is still unavailable. After repeating several times, it will be out of memory.
(The platform I use is Nsight eclipse 9.2, and all data is obtained in its debug mode.My GPU is NVIDIA Jetson TX2.)
Thank you !
I assume you declared gpudata as int* gpudata; ? Otherwise the kernel call would complain about incompatible pointer types in the arguments.
Just a minor nitpick: you need sizeof(int) * DATASIZE for the allocation.
There could be platforms where sizeof(int) is 8 bytes, but for CUDA I am pretty sure sizeof(int) is 4, same as sizeof(float).
So the code should not cause undefined behavior when invoked repeatedly.
Do you get the process killed when repeatedly starting the binary from a console, or do you have to put a for() loop around the cudaMalloc/cudaFree() part to make it crash?
Thank you for your answer!Sorry for the trouble that the above is not my source code. You can use the following code.
I must say, when I use the following code to RUN, there is NO error. Its problem is only reflected in the DEBUG mode.When I used the STEP OVER, it happened.
When I step to this line “cudaMalloc(gpudata7)”, my GPU usage has been 7197MB, and I can’t step over it, because it will turn off the display due to insufficient memory.
They will probably need to know the exact driver and CUDA toolkit versions you are using, and the above repro code plus a description how to reproduce it exactly (i.e. which compiler arguments you used)
“File attachments are currently not supported on this form - please send as attachments via email to NVSDKIssues@nvidia.com referencing the Bug ID listed on the My Bugs section of My Account.”
what strikes me as odd is this statement:
For Jetson Platform issues post your question on the NVIDIA Developer Forums.
Does this mean Jetson support issues are generally not handled on developer.nvidia.com?
The FAQ for Jetson states this:
How can I get support for my Jetson Developer Kit or module?