Unified memory and overprovisioning

Hello, I am trying to understand something about unified memory, I was under the impression that allocating and working on data sets beyond the size of the device memory is possible but that doesn’t seem to be the case.

For example if I try N = 8192 * 5 in the code below, which would work out to an array of size ~6.7 GB that is within the VRAM capacity of my GPU (GTX 1080) everything works fine but if I try for instance N = 8192 * 6 which is about ~9.6 GB, so over the 8 GB of VRAM my card has, the allocation function starts returning out of memory errors.

	size_t N = 8192*5;
	float *A;

	cudaMallocManaged(&A, size_t(N * N * sizeof(float)));

	for (size_t i = 0; i < N * N; i++)
		A[i] = 1.0f;


Am I misunderstanding something ? I should mention that I use CUDA 9.2 because I want to work with cuBLAS in device code and apparently support for that was dropped in version 10.0.

Did you verify that oversubscription is supported on your GPU?


Device memory oversubscription is possible for GPUs that have a non-zero value for the device attribute cudaDevAttrConcurrentManagedAccess. Managed memory on such GPUs may be evicted from device memory to host memory at any time by the Unified Memory driver in order to make room for other allocations.

It would seem that the GTX1080 does support oversubscription, based on reply 6 in this thread:

Are you using Linux or Windows? Oversubscription is not available on Windows:


Yes, I’m on windows, so that must be it. Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.