pinned memory vs mlock()

is there any difference between the pinned memory and mlocked memory?
My machine has a few gigabytes memory. so there is no chance of swapping out.
but if i use big pinned memory. the performance drops.
could you explain why?

Can you supply your machine details (sockets, memory etc.)?
I’ve seen lower performance with MLOCK than with pinned memory.


Decreased performance for too much pinned memory is a normal behavior - check chapter 3.2.4 in

Sorry, but that statement is far too general in my opinion.

How do you define too much? Do you have some definition?

Here’s the part of the prog. guide you refer to right?

Page-locked host memory is a scarce resource however, so allocations in pagelocked memory will start failing long before allocations in pageable memory. In addition, by reducing the amount of physical memory available to the operating

system for paging, consuming too much page-locked memory reduces overall

system performance.

However, wuninsu clearly states that there is no paging so there should be enough memory available and no performance decrease.

If the “scarce” ressource is kind of empty then the program would fail completely.

@wuninsu: What do you mean by few GBytes of memory and what is big pinned memory.

I myself work with more than 12 GB of pinned memory on 24 GB overall memory and do not see a reduction or performance problem.

I saw that sentence. so i want to know why

I have 24 GB memory. I think there is no chance to swap out.

Oh really? I am using 24 GB overall memory and the I used 600 MB pinned memory then the performance drop.

if i reduce it until 200 MB, the performance became 1.8 times better.

so i don’t know the reason :( i didn’t change anything but only pinned memory size.

uhm, i thought that pinned memory is mapped to GPU memory.

But whole GPU memory is less than 1.2 GB.

Is it possible to use 12 GB pinned memory?

And the question is what is pinned memory?

what is difference between mlocked memory?

Pinned memory means a special featured memory which is allocated in such a way that a page cannot be swapped out or migrated to another NUMA domain.

Thus, the bandwidth when copying memory from CPU to GPU and vice versa, is higher over PCIe if you have pinned memory on the CPU side.

malloc + Mlock should do the same as cudaAllocHost but my benchmarks indicate otherwise.

Mapped memory is mapped to GPUs. Only pinned memory can be mapped but is not necessarily.

Malloc+mlock performs the same for me for a number of datasets (32,64,128,256,512 and 768 MB).
That is with P45 chipset and GTS450.

Thanks for this information. What do you do on the datasets? I just tested with a simple Vector Copy.

I did just some initialization with a constant value.
But on principle it should always be the same regardless of the operation.
Maybe it had to do with alignment of the page in memory?but i really don’t understand your results.

Ok, that’s on me then. Of course I meant the bandwidth over PCIe so the MEM Performance on CPU side should be fine.