Should cudaMallocHost() need retry?

Cui · January 7, 2016, 6:11pm

Hi,

I’m using CUDA 7.5 on Ubuntu 14.04, and I find when I do cudaMallocHost(), sometimes it will fail (with error code 30, “unknown error”) even though the system still has enough memory (actually way beyond enough, just hundreds of MB allocated out of 100 GB available memory).

Since the problem does not happen all the time, I did a workaround that I added a while loop outside cudaMallocHost() to retry after failure. With those retries, my problem was solved.

However, I’m still a little bit worried about that. Should this be happening at all? Is it appropriate to retry cudaMallocHost()?

Thanks,
Cui

Robert_Crovella · January 7, 2016, 6:31pm

I don’t think it should need a retry. I have done some extended testing of cudaHostAlloc on redhat systems (Fedora/Centos/RHEL) and haven’t witnessed that behavior.

I always use cudaHostAlloc instead of cudaMallocHost, although I have no reason to think that should matter for the described behavior.

Cui · January 7, 2016, 6:47pm

Maybe that’s another problem specific to Ubuntu. (Here I would like to cite my other thread https://devtalk.nvidia.com/default/topic/883675/cuda-programming-and-performance/pinned-memory-limit/)

I really wish I could switch to CentOS. However, it’s really hard to get my dependencies on CentOS. Many of the dependencies that I use are not included in yum repositories.

Thanks,
Cui

Gregory_Diamos · January 9, 2016, 7:08pm

As far as I know it should not need a retry unless you are actually close to running out of memory, and some memory gets freed up in between the retries.

tera · January 10, 2016, 1:33pm

It seems unlikely with the extreme values you mentioned, but could this be a memory fragmentation problem?

I haven’t looked into this for a long while though, so I am not even sure cudaMallocHost() / cudaHostAlloc() still requires contiguous memory (it seems to me theGPU’s MMU could be able to handle fragmented host allocations but as this is all undocumented I am not quite sure).

Have you checked your max locked memory limit setting with ulimit -a?

Topic		Replies	Views
Program stucks on cudaErrorMemoryAllocation after failing a cudaMallocHost CUDA Programming and Performance	12	347	August 23, 2024
Problem CudaMallocHost CUDA Programming and Performance	4	2204	July 14, 2015
Pinned memory limit CUDA Programming and Performance	16	14046	May 1, 2016
check for cudaHostAlloc Portable possibility CUDA Programming and Performance	13	3038	July 1, 2015
1st call to cudaMallocHost fails... ... but next calls are OK. (!?) CUDA Programming and Performance	1	6203	January 8, 2009
cudaMallocHost crash since update from cuda 7.0.28 to 8.0.44 CUDA Programming and Performance	5	1480	July 31, 2017
cudamallochost problem CUDA Programming and Performance	6	10950	March 10, 2011
cudaMallocHost fails with out of memory error CUDA Programming and Performance	0	2248	June 18, 2008
Is cudaHostAlloc() fast? CUDA Programming and Performance	4	974	March 28, 2024
After cudaMalloc, host stalls CUDA Programming and Performance	0	3224	March 11, 2009

Should cudaMallocHost() need retry?

Related topics