Problems with Unified Memory Under Pascal

19CS19 · January 23, 2017, 11:21am

Inspired by the advantages of unified memory with Pascal, iswitch from Maxwell to Pascal.

But I have a problem all my systems and programms are not using unified memory with Pascal when I allocate memory with cudaMallocManaged. They are going back to use Zero Copy Memory. Did I miss something?

Running: Win_64bit i7-i930 with one GTX1080 and Win7_64bit i7-4790 with a GTX1080 and Win1064bit i5-6600k with GTX1060. All with VS2013. Also Nsight don’t show any unified memory allocation or PageFaulting or DataMigration.

It is the same programcode I used under Maxwell without any problems. To set CUDA_MANAGED_FORCE_DEVICE_ALLOC = 1 under CMake does also not help.

Additional Infos: cudaDevAttrConcurrentManagedAccess is 0

A hint what I am doing wrong would be nice. Thanks.

Robert_Crovella · January 23, 2017, 5:20pm

If you have multiple GPUs in the system, and those GPUs are not attached to the same PCIE root complex, then managed allocations become ZC allocations instead. This particular behavior is documented in the programming guide.

Do you have multiple GPUs?

If you want to work around this (force CUDA to only have 1 GPU in view) use the CUDA_VISIBLE_DEVICES environment variable, which is documented in the programming guide.

19CS19 · January 24, 2017, 8:44am

That is my problem. I don’t have multiple GPUs, but my system behavius like i would. I read the part of the documentation twice.

But thanks, you inspired me to try to set CUDA_VISIBLE_DEVICES and check how many device the cuda compiler means to see. It coud help me to find out what goes wrong.

Possibly somebody knows or has an idea how the nvcc compiler finds the GPUs and what could get wrong.

My last hint ist that something is wrong with my cudaMallocManaged implementation with C++ like in

https://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6/ Unified Memory with C++.

I used this to more simplicated my code, and get a kind of autopointer with constructors, destructors and copy funktions. I will try a clear project without this.

Okey, I now allready debuged the Cuda Samples for UnifiedMemory after Forcing the compiler to compile for arch=61. I also there have the same behavior, no PageFaults are triggered in Nsight. Also a repeat call of Kernels so that the Pages would be migrated to the GPU when Unified Memory would work show no better kernel time. The time stays constant so that it is ZeroCopyMemory

Or is a C++11 compiler necessary?

Topic		Replies	Views
Unified memory in Pascal architecture. CUDA Programming and Performance	1	721	August 4, 2017
Unified Memory in Pascal CUDA Programming and Performance	5	847	September 24, 2018
Pascal resorting to zero-copy memory CUDA Programming and Performance	9	1922	August 14, 2017
cudaStreamAttachMemAsync behavior questions GPU-Accelerated Libraries	0	1660	September 19, 2016
cudaMallocManaged and CUDA 8.0 CUDA Programming and Performance	5	2526	June 21, 2018
SM architecture 6.x additional Unified Memory (PeagableMemoryAccess and ConcurrentManaged Acess) support CUDA Programming and Performance	2	707	July 10, 2017
Pascal & capabilities 6.0 show cudaDevAttrConcurrentManagedAccess is 0 CUDA Programming and Performance	15	1367	December 27, 2018
Bad performance when using unified memory CUDA Programming and Performance	2	3389	April 21, 2019
Does Pascal Unified Memory, mentioned in pascal whitepaper, supported now? CUDA Programming and Performance	9	1681	April 14, 2017
about managed memory Legacy PGI Compilers	1	1775	October 9, 2017

Problems with Unified Memory Under Pascal

Related topics