Hello, I am trying to understand something about unified memory, I was under the impression that allocating and working on data sets beyond the size of the device memory is possible but that doesn’t seem to be the case.
For example if I try N = 8192 * 5 in the code below, which would work out to an array of size ~6.7 GB that is within the VRAM capacity of my GPU (GTX 1080) everything works fine but if I try for instance N = 8192 * 6 which is about ~9.6 GB, so over the 8 GB of VRAM my card has, the allocation function starts returning out of memory errors.
size_t N = 8192*5;
cudaMallocManaged(&A, size_t(N * N * sizeof(float)));
for (size_t i = 0; i < N * N; i++)
A[i] = 1.0f;
Am I misunderstanding something ? I should mention that I use CUDA 9.2 because I want to work with cuBLAS in device code and apparently support for that was dropped in version 10.0.
Device memory oversubscription is possible for GPUs that have a non-zero value for the device attribute cudaDevAttrConcurrentManagedAccess. Managed memory on such GPUs may be evicted from device memory to host memory at any time by the Unified Memory driver in order to make room for other allocations.