Pinned Memory Allocation Why should it be driver specific?

For allocating Pinned memory, why should we go via the NVIDIA driver? Can’t we have the OS allocate the pinned memory in the application’s space?

I hope modern OSes do have such system calls (but then, I am not very sure of all these)

If I understood the reasoning correctly, the problem isn’t allocating the pinned memory. The problem is that the driver needs to “know” that the pointer you’re passing to it is in pinned memory. And the simplest way to do that is to have the driver manage pinned memory.

I’m not even sure if it is possible to find out if a pointer points to pinned memory after it’s been allocated.

@Conf…,

Your explanation is convincing… The driver is scared of pointers. And it needs to be…

Cant drivers query OS on the validity of pointers??

I get reminded of Linux kernel routines “copy_to_user”, “copy_from_user” – which the kernel provides. But yeah, kernel can handle those faults…

But for DMA…from device… it can kill…

but I think, the OS should provide a way for it… Isnt it?

And, I do have a reason why I need pinned memory coming down from application… When I write CUDA plugins, I dont wnat to keep copying from app to pinned to device… If I can make the APP give me pinned memory, I can save an un-necessary copy inbetween… Hence…

Pinned memory isn’t just page-locked, it also needs to be mapped into the GPU’s address space. Doing the allocation from the API is also more portable and reliable.

True, it needs to be bound to GPU’s space. No doubt about it.

I never contested that fact. One could still provide a “Map” function that will allow user-allocated Pinned buffers (allocated through some API) to GPU address space.

That way, my application can use pinned buffers from the start. OR I can port existing applications that already use pinned buffers easily.

It would also make sense for the user to specify "pinned"ness attribute to host-based arrays. The ELF and PE format + OS should support it so that application can keep its most criticial data-set pinned…

The OS need not guarantee this service. But atleast it could make an intelligent move while swapping out RAM to disk. Oh then… We are on different land. No more on CUDA…

Let’s say you have a buggy app. It does something like

cudaMapPinnedBuffer(buffer...);

cudaMemcpy(GPUmem, buffer...); //copy from buffer

... //do stuff on the GPU

cudaUnmapPinnedBuffer(buffer); //awesome we're done with it

...

cudaMemcpy(buffer, GPUmem...); //copy to the buffer that we've already unmapped

Whoops, you might have crashed your app. Maybe you crashed another app. Maybe you crashed your machine. Maybe you hit the jackpot and caused a triple fault. Depends on whatever physical memory happened to be there when the DMA happened. Very bad.

So yeah, I don’t think exposing such functionality is really a good idea…

This basically means that “CUDA Driver Cannot query the OS to find whether an address-range is Pinned OR Not; Physically Continuous or not”.

And yeah, There must also be a way to retain the buffer in memory by increasing its OS refcnt (using an OS interface).

I am surprised modern OSes dont offer this capability. (I dont follow OSes that closely). But yeah, quite surprising…

There isn’t a way to do that kind of refcounting on Linux, strangely enough.

Thanks for your time!