Pascal resorting to zero-copy memory

I have several Pascal devices (GTX 1080s) which are unable to use managed memory, and revert to zero-copy which is so slow that the devices are unusable.

Environment:

  • Maxwell devices work as expected with the same binary in the same environment.
  • Pascal devices are all in single-GPU systems
  • Windows 8.1 x64 genuine with all updates
  • Latest nVidia drivers as well as several previous versions
  • Cuda Toolkit 8.0, latest update
  • VS 2013 and 2015

NSight System Info reports:

  • Concurrent Managed Access FALSE
  • Managed Memory TRUE
  • Pageable Memory Access FALSE

I’ve been trying for months to figure out what is going on. We have many systems and all with Pascal devices are afflicted. Older systems with Maxwell devices are working fine. When we do Maxwell-Pascal swaps, the problem follows the Pascal GPU.

I don’t have any specific insights into the problem based on the information provided so far, but for the benefit of other forum participants you might want to mention:

(1) The specific Windows driver version used
(2) The specific Maxwell-family devices used

Driver defects could impact (1) while device capabilities (e.g consumer vs Tesla/Quadro GPUs) could impact (2). Please confirm: The “Maxwell/Pascal swaps” are controlled experiments in which nothing changes other than the GPU plugged into the system under test.

Fair points.

(1) I don’t have a comprehensive list of drivers that we have tried, but I can confirm that 382.05 is affected. We haven’t found any version that wasn’t, but that doesn’t mean there hasn’t been one.

(2) We have tested (without issue) Maxwell Titan X, 980Ti, and 970.

(3) To confirm, “Maxwell/Pascal swaps” is literally powering down the system, physically changing the cards, and starting the system. We’ve confirmed it by swapping them back, and then again, and then again, and the problem always appears when using a Pascal card and never when using Maxwell.

All of our Pascal cards are of the same model and manufacturer, although we’ve tried different BIOS versions (Gigabyte G1 Gaming GTX 1080). Any chance this is a problem to this specific manufacturer/model?

I can’t imagine a scenario where the specific brand of Pascal-family device would cause problems, they should be completely interchangeable as far as driver behavior goes. I don’t have an explanation for your observation and would hope that someone with deeper insights into the driver and/or Pascal can point out what might be going wrong here. I agree that the observed behavior does not make sense based on the controlled experiments described, as the capabilities of Pascal-class devices should be a superset of Maxwell-class capabilities.

How do you detect the use of zero-copy memory?

Could it be that you are actually seeing demand-paging instead of upfront-copying of all managed data to the GPU?

Good point. I have considered that as well.

  1. nVidia Visual Profiler reports “==290352== Warning: Unified Memory Profiling is not supported on the current configuration because a pair of devices without peer-to-peer support is detected on this multi-GPU setup.”

**I’ll note again that this is a single-gpu system, despite what the profiler says. However, I think I read somewhere else that the profiler reporting this was a known bug, and didn’t actually reflect the status of managed memory. So, I’ll accept that this may not be a reliable indicator.

  1. The items that are in managed memory (ie not eplicitly copied to/from the device) are pretty simple C++ objects of which there are around 2-16 of them that need to be copied to the device depending on the situation, and the total size of them is well under 1KB.

  2. Performance on Maxwell Titan X, as well as copying the objects manually on Pascal, is in the range of 7ms-30ms. When using managed memory, it goes to around 150ms-500ms.

  3. The first thing that I tried was usage hints. Granted I could have screwed that up somehow, but it made exactly 0 difference in speed.

  4. If I understand correctly, page faulting requires “Concurrent Managed Access” to be True. I’m guessing that “Pageable Memory Access” is also indicating whether the device can page fault. Both of these return FALSE on our devices, although I believe that they should return TRUE on Pascal devices.

One thing that I am trying to do to follow up is test this on a different brand/model of GTX 1080. I’m told that we have a few of them sitting idle somewhere, but are having a hard time locating them. I don’t want to spend $1000 of my personal money to buy one just to test for 5 minutes, so I’ll have to wait until the guys with physical access to our equipment can provide me one to test.

Hallo,
i have the same problems since november last year. I already posted it as Bug to Nvidia Jan 25 with several updates and no reaction of NVIDIA to it. And i tried to post it here two times. Possibly nobody could help me because my posts could be confusing, because i am not a native speaker.

In my case my software (three programms) becomes much slover after changing from Maxwell to PASCAL GPUs. I have a complex memory structure (Calls to array give my adresses of other arrays), so the descriped behavior of this forum post hits me very hard. In Nsight a performance analysis shows the same behavior like Zero-Copy-Memory on Pascal Cards, on Maxwell Cards all works well. But i really would need the new memory paging feature of Pascal GPUs.

I can confirm this behavior in diffrent combinations: On two Systems Win 7 VS 2013 or VS 2015 CUDA 8 with GTX 1080, priviously GTX 980Ti. And one system Win 10 VS 2013 or VS 2015 CUDA 8 with GTX 1060 priovusly GTX960. Tested all driver and cuda versions since november 2016. All the between Maxwell and Pascal where done by only changing the Graphic Card and nothing else.

NSight System Info also reports:

Concurrent Managed Access FALSE
Managed Memory TRUE
Pageable Memory Access FALSE

Also there are no memory paging reported during a Nsight perormance analysis.

On all system the programs with the problem run perfectly under Linux (Ubuntu) and Nsight correctly reports there the use of Memory Paging for CPU and GPU.

nVidia Visual Profiler also reports in my case "==290352== Warning: Unified Memory Profiling is not supported on the current configuration because a pair of devices without peer-to-peer support is detected on this multi-GPU setup. But i also only have single GPU systems.

Performane example:
Noise reduction with non linear-mean-filter of thousend of spectra: GTX 980 Ti Windows 15minutes, GTX 1080 Windows 90 minutes, GTX 1080 Linux nearly 8minutes.
Interactive Mutlispectral Visualization of confocal raman microscopy: GTX 1080 Windows 0.25 frames a second: GTX 1080 Linux abouve 100 frames a second. (Due to complex data structure)

Interessting seams to be that my cards are also all from Gigabyte. GTX 1080 Xtreme Gaming and GTX 1060 Xtreme Gaming. So it really could be an Gigabyte problem.

I hope my description helps a bit. Please inform me when you found somthing out.
I am now doing my work on Linux.

Hi,

It is a BUG of NVIDIA on Windows Systems witch occurs with PASCAL architecture.

I know this since a few days, but could not write it here because i was on vacation without internet connection.

For details see the comments of: https://devblogs.nvidia.com/parallelforall/unified-memory-cuda-beginners/
where Mark Harris from NVIDIA confirms the Bug. It should be corrected with CUDA 9. He also tells that it should be communicated to Microsoft to help the caus. But i didn’t found a suitable Microsoft Bug Report Page till now.

New user here, running CUDA 9 RC with a single GTX 1070 on Windows 10, and the issue at least for me persists. Don’t know if there is a better way to report it than here. But it seems to me CUDA 9 did not provide a fix.

Hi,

it looks like partially solved.

I can confirm that it isn’t anymore resorting to system memory with CUDA 9.0RC on Win7 64bit VS 2013 with a GTX1080.

But on Win 10 64bit i have to systems where the bug still apears. (GTX 1050Ti VS 2013 and GTX 1060 VS2015)

I really have to test (force) the paging feature, but i don’t have the time for it. So i only can say something about the resorting to system memory, or at least usal unifeid memory behavior.

Best way to report it would be a bug report to NVIDIA (Best would be every one how is effected will borther them and also poste there links in the bug report to threads like this so that someone from NVIDIA can post somthing if he likes. I really didn’t found anything to my bug report for mothns in any documentation updated or any webseite or forum of NVIDIA, so i would try to post in the bugreport links to forum threads).

I will poste hear if I will find out something new or Mark Harris will post something in the comments of: https://devblogs.nvidia.com/parallelforall/unified-memory-cuda-beginners/ to my updated.