Warning: Unified Memory Profiling is not supported on this configuration

I’m trying to profile the following simple code:

#define BLOCKSIZE 32

/**********/
/* iDivUp */
/*********/
int iDivUp(int a, int b) { return ((a % b) != 0) ? (a / b + 1) : (a / b); }

/********************/
/* CUDA ERROR CHECK */
/********************/
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
   if (code != cudaSuccess) 
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }
}

/*******************/
/* KERNEL FUNCTION */
/*******************/
__global__ void kernel(int *vec1, int *vec2, int *vec3, int N) {

	int tid = threadIdx.x + blockIdx.x * blockDim.x;
	
	if (tid < N) vec3[tid] = vec1[tid] + vec2[tid];
	
}
 
/********/
/* MAIN */
/********/
int main() {
	
	const int N = 10;
	
	int *vec1, *vec2, *vec3; 
	
	gpuErrchk(cudaMallocManaged(&vec1, N*sizeof(int)));
	gpuErrchk(cudaMallocManaged(&vec2, N*sizeof(int)));
	gpuErrchk(cudaMallocManaged(&vec3, N*sizeof(int)));

	for (int i=0; i<N; i++) {
		vec1[i] = i;
		vec2[i] = 2*i;
	}
	
	kernel<<<iDivUp(BLOCKSIZE,N), BLOCKSIZE>>>(vec1, vec2, vec3, N);	
	gpuErrchk(cudaPeekAtLastError());
	gpuErrchk(cudaDeviceSynchronize());	

	for (int i=0; i<N; i++) {
		printf("vec1 = %i; vec2 = %i; vec3 = %i \n", vec1[i], vec2[i], vec3[i]);
	}
	
	return 0;
}

However, the NVIDIA Visual Profiler gives me the following warning in the console panel

Warning: Unified Memory Profiling is not supported on this configuration

As a result, the timeline does not show any relevant information about the kernel launch.

My configuration: CUDA 6.5; Kepler K20c; Windows 7.

Do you have another nvidia gpu besides the K20c in that system?

Is the windows 32 bit or 64 bit?

The workstation has 4 Kepler K20c GPUs.

Windows is 64 bit and I’m compiling in Release mode for a 64 bit architecture.

Besides the 4 Kepler K20c GPUs, are there any other GPUs? What is driving the display?

Yes, the display is driven by a Matrox G200eR.

This description in the profiler user’s guide that accompanies cuda 6.5RC docs may be relevant:

“On multi-GPU configurations without P2P support between any pair of devices that
support Unified Memory, managed memory allocations are placed in zero-copy
memory. In this case Unified Memory profiling is not supported. In certain cases,
the environment variable CUDA_MANAGED_FORCE_DEVICE_ALLOC can be set to force
managed allocations to be in device memory and to enable migration on these hardware
configurations. In this case Unified Memory profiling is supported. Normally, using the
environment variable CUDA_VISIBLE_DEVICES is recommended to restrict CUDA to
only use those GPUs that have P2P support. Please refer to the environment variables
section in the CUDA C Programming Guide for further details.”

Although this is in section 3.2.6 which pertains to nvprof, I suspect nvvp may have a similar limitation. You might try to see if launching nvvp with the CUDA_VISIBLE_DEVICES environment variable set to a single GPU may help it to work.

I think txbob raised a very good point.
I will suggest trying to set cuda device to K20c, before running the computation.
I know that cuda cap 3.2 device doesn’t support um profiling.