nvprof and unified memory

I have two Tesla P100-SXM2-16GB with NVLink. When I want to profile CUDA code with Unified Memory the profiler fails with “unified memory profiling failed”.

I found some suggestions to run the profiler with root privileges, but since this is a shared cluster this is not an option for me. Are there any other ways?

which profiler?

nvprof or nvvp ?

What sort of machine is this in? Are there only two of those P100 in the entire machine? Or are there more?

Are you given access to these GPUs via a job scheduler?

The profiler is nvprof.

We have two nodes with two P100 each. The access is given via slurm as a job scheduler.

Are you certain that your individual cluster nodes only have 2 P100-SXM2 devices? Because that would be fairly unusual. Is it by chance a single node with 4 P100-SXM2 devices?

What is the operating system? Which CUDA version? What GPU driver?

Are you certain that the version of nvprof you are using matches the version that the CUDA code was compiled with?

I am certain and they also have different host names. As far as I know we do not use virtualization on this machine.

The OS is Ubuntu 16.04 with CUDA 9.0 and driver 384.111.

I am certain because CUDA is loaded via a module. If I do not load the module the nvprof is not in the PATH.

I’d be interested to know what server nodes those are.

Anyway, as a test, if you have a managed memory (unified memory) code that only requires a single GPU, I’d be interested to know what the results are of running:

CUDA_VISIBLE_DEVICES=“0” nvprof ./your_app

Here are the results you requested:

[…] which nvprof /nfs/software/ubuntu/16.04/cuda/9.0.176/bin/nvprof [...] CUDA_VISIBLE_DEVICES=“0” nvprof ./test/sort_benchmark/test_sort_benchmark
==5717== NVPROF is profiling process 5717, command: ./test/sort_benchmark/test_sort_benchmark
======== Error: unified memory profiling failed.

If I pass --unified-memory-profiling off to nvprof the profiling except unified memory works.

You don’t need root access to fix this. Since you are using Cuda 8.0 this should be working. The problem lies in the ‘iBUS’ config. All you have to do is delete the folder /home/[user]/.config/ibus/bus and the problem will be gone.