How to avoid shared memory swapping to system RAM

I’ve wrote an example code to allocate memory on the device. I want to test how much VRAM is enough for some special games. So what I want to do is to simulate graphic cards with less VRAM. But the problem is that the driver swaps the allocated memory to the system RAM. How can I avoid that?

Maybe there’s a better strategy. What is the beste way to simulate less VRAM on a given graphic card?

I didn’t understand what you mean by swapping allocated memory to system RAM. Maybe what you observed is the allocation of the “staging area” in system RAM by cudaMalloc, since it needs a non-pageable space to copy stuff between host and device. Do a search for “staging area cuda” or pinned memory to have some explanation.

Simulating an amount of device memory is easy. Consider you want to allocate a certain amount of memory in the video card, but it depends on how much you have free. That is, you can’t use absolute values here:

size_t free = 0, total = 0, alloc_size = 0;
float *dev_data;

cudaMemGetInfo(&free, &total);

free *= 0.5;
alloc_size = free * 0.1 * sizeof(float);

cudaMalloc((void **) &dev_data, alloc_size);
cudaMemset(dev_data, 0, alloc_size);

cudaMemcpy(dev_data, input_data, alloc_size, cudaMemcpyHostToDevice);

// Do something with the data in device, call your kernels...

cudaFree(dev_data);

What this code does is: it checks how much free memory you have, then calculate just half of it to simulate a card with less capacity. Then it allocates 10% of this pseudo-free space, and not 10% of what is really available.
It then copies to this piece of memory whatever data is in “input_data” (host) and call your kernel(s) accordingly.
This does allocation with relative free space, simulating a card with less capacity.

Notice that it doesn’t do any error checking, which you must do, and it could be further simplified using Thrust’s host/device_vector, getting rid of this manual memory management.

@saulopp: Yes, I would like to use pinned memory, but not on the host. It should be pinned on the device and not be swapped to host system RAM.

My scenario:

  1. I start my example app to allocate 5GB VRAM e.g.
    2 Then I start a game to check wether 11-5=6GB is enough VRAM for the game to run smoothly.
  2. The problem is that the driver or OS swaps the allocated memory by my test app to the shared graphic memory, which is simple the system RAM. So the game can use 11GB dedicated VRAM. This kind of memory management let my simulation fail.

How are you doing the memory allocation? What does it look like?
Are you using any graphical API that could be moving data around?

Except for UMA, if you are allocating with cudaMallocManaged, which moves data depending on where you reference it from, I can’t think of anything arbitrarily swapping data between host and device unless it is part of the functionality. I’m not sure something like “pinned device memory” exists.

This is my code:

#include <stdio.h>

int main(int argc, char *argv[])
{
	 unsigned long mem_size = 0;
	 void *gpu_mem = NULL;
	 cudaError_t err;

	 // get amount of memory to allocate in MB, default to 256
	 if(argc < 2 || sscanf(argv[1], " %llu", &mem_size) != 1) {
		mem_size = 256;
	 }
	 mem_size *= 1024*1024;; // convert MB to bytes

	 // allocate GPU memory
	 err = cudaMalloc(&gpu_mem, mem_size);
	 if(err != cudaSuccess) {
		printf("Error, could not allocate %llu bytes.\n", mem_size);
		return 1;
	 }

	 // wait for a key press
	 printf("Press return to exit and free memory...\n");
	 getchar();

	 // free GPU memory and exit
	 cudaFree(gpu_mem);
	 return 0;
}

No, I don’t use any 3D APIs during the test. I just start the game.

You are not allocating the amount of memory you expect.
Try the following code, where I changed a few lines, and see the behavior:

#include <stdio.h>

int main(int argc, char *argv[])
{
	 size_t mem_size = 0;
	 float *gpu_mem;
	 cudaError_t err;

	 // get amount of memory to allocate in MB, default to 256
	 if(argc < 2 || sscanf(argv[1], " %llu", &mem_size) != 1) {
		mem_size = 256;
	 }
	 mem_size *= 1024*1024 * sizeof(float); // convert MB to bytes

	 // allocate GPU memory
	 err = cudaMalloc(&gpu_mem, mem_size);
	 if(err != cudaSuccess) {
		printf("Error, could not allocate %llu bytes.\n", mem_size);
		return 1;
	 }

	 // wait for a key press
	 printf("Press return to exit and free memory...\n");
	 getchar();

	 // free GPU memory and exit
	 cudaFree(gpu_mem);
	 return 0;
}

Are you running this program in 2 terminals, so it holds the allocated memory simultaneously?

The windows WDDM memory manager (controlled by Microsoft, not NVIDIA) does virtual memory management on the GPU memory, and this applies to both CUDA and gaming. The WDDM memory manager may swap out even CUDA allocations to system memory, according to its own heuristics.

You don’t have any direct control over this.

You can influence the behavior by not just allocating the memory using cudaMalloc, but also writing CUDA code that accesses the memory. WHen the CUDA kernel is running, the WDDM memory manager will ensure that the needed allocations are physically resident in GPU memory, not “swapped out” to system memory, which is used as a backing store.

However, when a CUDA kernel is running in a WDDM environment, many other GPU activities (including display updates) are “frozen” until the kernel completes. And if you run the kernel too long, you will hit a WDDM TDR timeout.

So there is no simple method to do this. You could experiment with trying to run a short CUDA kernel (say, 100ms or less, in duration) once every second or so. But this is just playing games with the WDDM memory manager, and at some point your attempt to evaluate game behavior is going to be influenced by this, apart from any memory considerations.

Is this fully solved by a supported card running in TCC mode?

@Robert_Crovella: Many thanks!

You can’t run a typical game in TCC mode.

But yes, if you are in TCC mode, then the GPU memory management is not handled by WDDM or under the control of microsoft (and it is also not handled in a virtual memory fashion, unless you have UM oversubscription)

I should have mentioned just the resource management :)
Tks