This description in the profiler user’s guide that accompanies cuda 6.5RC docs may be relevant:
“On multi-GPU configurations without P2P support between any pair of devices that
support Unified Memory, managed memory allocations are placed in zero-copy
memory. In this case Unified Memory profiling is not supported. In certain cases,
the environment variable CUDA_MANAGED_FORCE_DEVICE_ALLOC can be set to force
managed allocations to be in device memory and to enable migration on these hardware
configurations. In this case Unified Memory profiling is supported. Normally, using the
environment variable CUDA_VISIBLE_DEVICES is recommended to restrict CUDA to
only use those GPUs that have P2P support. Please refer to the environment variables
section in the CUDA C Programming Guide for further details.”
Although this is in section 3.2.6 which pertains to nvprof, I suspect nvvp may have a similar limitation. You might try to see if launching nvvp with the CUDA_VISIBLE_DEVICES environment variable set to a single GPU may help it to work.
I think txbob raised a very good point.
I will suggest trying to set cuda device to K20c, before running the computation.
I know that cuda cap 3.2 device doesn’t support um profiling.