Pinned memory size problem

sa3sa3 · December 11, 2009, 7:01am

Hello Everyone,

I hope anyone could help me with this issue, I've  implemented a simple device application that modifies a float array through the device pointer. This array  is allocated by cudaHostAlloc (mapped memory), and then I access it through the host pointer to check its results (I've synchronized between the CPU and GPU). the problem is that it works well when with small sizes, but when more than 30 MB are allocated it does not work well. I don't understand what am I missing here.

does it have to do with the block, grid dims, and thread size?

Thank you in advance.

Nico · December 11, 2009, 8:58am

Could be related to section 3.2.5 of the Programming guide:

N.

sa3sa3 · December 11, 2009, 10:23am

Thank you for your reply, but it seems like I don’t get it, doesn’t the GPU directly access the main memory in case of pinned memory, and mainly we have more main memory than device memory, thus we can have larger allocations in case of pinned memory rather than pageable memory?

Thank you for your help.

avidday · December 11, 2009, 10:38am

I don’t believe so.

Pinned memory is effectively just a contiguous host memory reservation by the GPU driver which the operating system is prevented from paging in and out of virtual memory or moving about in the real memory address space (for fragment management, garbage collection, etc). Usually the largest free continuous block of memory which is available in the address space will be considerably less that the sum of free memory, because of fragmentation, etc. I would expected the total amount of allocatable pageable memory to always be larger than pinned memory for this reason. The CUDA zero copy functionality adds what is effectively DMA to pinned memory, so that not only is memory pinned by the driver, but the GPU can write to directly it over the PCI-e bus without the need for explicit host side copy functions.

Nico · December 11, 2009, 10:39am

I think you’re confusing pinned memory with zero-copy. Zero copy memory is beneficial for integrated graphics cards with no dedicated memory, so they work in the same memory pool (system memory) as the CPU. In this case, using zero-copy mem avoids an extra memcopy.

Pinned memory is simply page-locked system memory. If you’re not allocating the storage in pinned memory and you upload it to the GPU, then the driver will first copy the storage into pinned memory so that it can start a DMA transfer to the GPU.

If you allocate your storage directly into pinned memory, this extra copy is not needed. That’s why it’s faster. But as the manual states, page-locked memory is a scarce resource so although you may have many gigabytes of system memory, the available amount of page-locked memory can be significantly less.

N.

Topic		Replies	Views
Is cudaHostAlloc() fast? CUDA Programming and Performance	5	645	March 28, 2024
Is it possible to use pinned memory? Outside of CUDA CUDA Programming and Performance	14	6312	January 22, 2025
Allocating pinned memory with large RAM configurations CUDA Programming and Performance cuda , python	3	80	January 7, 2025
Host-pinned memory bookeeping? CUDA Programming and Performance	8	44	November 27, 2024
Change limit of 50% for cudaHostAlloc pinned memory on Windows 10/11 CUDA Programming and Performance	9	3057	September 19, 2022
Pinned memory slows CPU computation Jetson TK1	5	1424	January 8, 2016
Can I create a pinned memory buffer to support overlapping compute/copy without cudaMallocHost overhead CUDA Programming and Performance cuda	13	834	November 3, 2020
cudaHostAlloc not consistent between CUDA 3.0 beta 1 and CUDA 3.0 CUDA Programming and Performance	0	1272	April 16, 2010
Memory-type quesions CUDA Programming and Performance	7	528	April 21, 2023
MultiGPU Pinned and pageable memory CUDA Programming and Performance	0	797	June 9, 2010

Pinned memory size problem

Related topics