cudaMallocHost causing segfault — how to track its memory footprint to avoid exhausting GPU memory?

ramalhokoehne · June 10, 2025, 5:30pm

Hi everyone,

(GPT aided me in the question below)
I’m encountering a segmentation fault when reading data into a large array Refl (200 GB) that I allocate with cudaMallocHost. This issue started after I introduced an additional device array (d_Fop_in_nwfft) in my code. I use multiple GPUs and cap each GPU’s memory usage with a parameter perc_gpu, currently set to 95%.

I understand that cudaMallocHost allocates page-locked (pinned) memory on the host, but I’ve read that it can also indirectly put pressure on device memory due to internal driver allocations for mapping and DMA buffers. So I’m suspecting that this extra pressure might be pushing my GPU usage over the edge when I introduce the new d_Fop_in_nwfft buffer.

❓ My main question is:

Is there a reliable way to track or estimate how much memory cudaMallocHost uses from the GPU (or pinned-memory pool), so I can adjust perc_gpu accordingly and avoid exceeding the limit?

Secondary questions:

Are there profiling tools or API calls that expose this memory usage clearly?
Should I conservatively lower perc_gpu to, say, 85–90%, when using large pinned allocations?

Thanks in advance for any suggestions or clarification. The issue on cudaMallocHost being the problem is a guess; in short, what happened was: after adding some extra device arrays (keeping the 95% cap, and reducing batch size), the code started to segfault when reading a big dataset into a cudaMallocHost allocated array.

striker159 · June 10, 2025, 5:39pm

Regarding the segmentation fault, you could use a host-side memory checker like valgrind to track down the issue.

njuffa · June 10, 2025, 5:46pm

Citation needed. News to me. Have you performed some basic experiments that demonstrate this effect?

Segfault is triggered by code running on the host, so I do not see a ready connection with device arrays. Likely causes:
(1) Failing host-side memory allocation that isn’t caught (improper error handling)

(2) insufficiently sized host-side memory allocation leading to access out of bounds

(3) prematurely freed host-side memory allocation (have you implemented ref counting?)

(4) access out of bounds on a host-side memory allocation, either as part of incorrectly-sized bulk transfer, or faulty array index computation, or invalid pointer (including null pointer).

Other root causes are possible, of course.

ramalhokoehne · June 12, 2025, 12:49pm

@njuffa Thanks for pointing me in the right direction, indeed it was nothing to do with cudaMallocHost and the reading. The segfault was unrelated; GPT pointed me towards this direction.
“Citation needed. News to me. Have you performed some basic experiments that demonstrate this effect?” When retorted GPT about that, it found NVIDIA documentation saying the exact opposite. He (it?) even said he tested it (as below), providing a full answer contradicting his first guess.

In hindsight trusting GPT’s direction and help with the question was a bad call; describing the problem without extra info should have been the correct approach. There is no evidence that a huge array (~200 GB) allocated with cudaMallocHost affects the GPU’s memory. The segmentation fault was due to another detail in the last code’s modifications and had nothing to do with the newly included device array. @striker159 I ended up not using Valgrind due to the size of the datasets, but checking host side leaks was the correct way to solve the problem.

Topic		Replies	Views
cudaMalloc causes segmentation fault 2 Mo is far from my 1,2 Go card memory limit CUDA Programming and Performance	7	7557	June 28, 2011
(little_jimmy's project is now) ashes and dust: pinned memory segmentation fault CUDA Programming and Performance	4	1011	June 8, 2015
host malloc segmentation fault CUDA Programming and Performance	2	822	January 9, 2015
How to allocate memory for host in cuda main function? CUDA Programming and Performance	5	412	July 28, 2023
cudaMalloc segfaulting Possible cause? CUDA Programming and Performance	7	4101	September 26, 2008
Memory shortage in Host Code CUDA Programming and Performance	3	3692	May 15, 2008
Error(Segmentation fault) while using cudaHostAlloc ,Does parameter size require size? CUDA Programming and Performance	1	598	May 31, 2019
Using cudaHostAlloc CUDA Programming and Performance	0	6522	May 9, 2011
How to track down a Segmentation Fault in Big Programs CUDA Programming and Performance	5	1926	January 26, 2011
Solved: Memory Allocation Problems CUDA Programming and Performance	2	4212	September 7, 2015

cudaMallocHost causing segfault — how to track its memory footprint to avoid exhausting GPU memory?

❓ My main question is:

Related topics