Unable to compile file with unified memory

I installed Cuda 8 yesterday on a Windows 7 PC that has a Quadro K2000 GPU board installed in it. This card has a compute capability of 3.0 which supports unified memory access according to Nvidia’s website. I can get the GPU to execute my cudaMalloc/cudaMemcpy code ok, but it will not execute my cudaMallocManaged code. Whenever my code tries to initialize the shared data from the CPU, it crashes the program. The code I am trying to run is from Mark Harris’s post on the Parallel ForAll forum titled: “An Even Easier Introduction to CUDA”.

Could the installation have picked up the wrong GPU architecture during the installation process?

If so, is there a way to force it to use the 3.0 capability?


Add proper cuda error checking to the project. Unfortunately the original/blog code did not include that.

Confirm that you have properly set the compile operation to use compute capability 3.0, and be sure that you are building a x64 project, not a win32/x86 project.

The install process does not set the architecture that the compiler will use for code generation. You have to set that for each project, if you don’t want to use the default value. The default value is not correct for cc3.0 code.

You might want to study some of the CUDA sample projects to see how they set compute capability for the compiler, and study the compiler output from compiling one of those projects vs. the compiler output from compiling your project.

Thank you!!!