Why does it take avalible host memory

My code is

#include <conio.h>

#define DATA_SIZE 8192*8192

int data[DATA_SIZE];

int* gpudata;

int main()


	cudaMalloc((void**) &gpudata, sizeof(int) * DATA_SIZE);



	return 0;


I found that cudaMalloc((void**) &gpudata, sizeof(int) * DATA_SIZE); will take about 15MB host memory.

Even I free it. It won’t return. Why? :blink:

You’re probably running in emulation mode - the code runs on the CPU and not on the GPU.

As for the freeing - 15MB is probably too small to notice in the task manager


I checked my mode and it is not emulation mode.

And the lost 15 MB is not related to the size of data array.

It seems to be some default usage when you use cuda at first time.

Because after the first cost of the 15 MB. Even I re-allocate the GPU memory and then free it again.

No Extra memory will be used.

Is 15 MB a real problem?
This is probably overhead of Runtime API (it hash to load DLL, init internal structures etc).