[Solved] New install, Cuda prgm compiles and runs in VS 2013, Kernel does nothing

In the YouTube video “CUDACast #2” by Mark Ebersole, he presents some CPU code, which he then modifies to demonstrate how to do it with the GPU. I have followed his demonstration to a T in VS 2013 --and the program does compile and run on my system.

CUDACast #2: [url]http://devblogs.nvidia.com/parallelforall/cudacasts-episode-2-your-first-cuda-c-program/[/url]

However, at runtime the c array doesn’t get updated, and the printout for each c[i] is “c[0] = 0, c[1] = 0, c[2] = 0, c[3] = 0”, etc. I’ve ensured the code is accurate–but it appears the Cuda kernel is not operating, and not performing the addition or updating the c array.

The background of my set up is as follows:

I downloaded and installed CUDA 6.5 on my Windows 7 64 bit machine. Then I downloaded and installed VS 2013 Community. My Graphics card is the GTX 260 Core 216 with display driver 340.62. This is an older card with the GT200 GPU, but it is Cuda enabled with Compute Capability 1.3

The largest blocksize for Compute Capability 1.3 is 512 instead of 1024; hence, in the code I set “#define SIZE 1024” to “#define SIZE 512” instead. I even tried a much smaller SIZE number, such as 20.

Finally, Cuda 6.5 is the last version to support my GPU model, and also the first to work with VS 2013. So version 6.5 should be the ticket with my hardware and software.

Everything appears fine, except my GPU just doesn’t seem to know it’s actually supposed to do something. What am I missing?

Add proper cuda error checking to the code. If you’re not sure what proper cuda error checking is, google “proper cuda error checking” and take the first hit.

I added error checking code and got the following report:

(referring to the line of the kernel launch call) GPUassert: invalid device function

My kernel launch call is
VectorAdd<<< 1, SIZE>>>(d_a, d_b, d_c, SIZE);

I’ve checked that the call is otherwise valid (valid arguments). The previous runtime API calls in the code (i.e. cudaMalloc(), cudaMemcpy()) appear to work fine.

I built and ran a CUDA sample program (matrixMul) with success. One of its included headers is helper_functions.h. This header isn’t available in the dependency list of my current project, however my current project does include these headers:

<cuda_runtime.h>
<assert.h>
“device_launch_parameters.h”
<stdio.h>

I found the problem. Looking in the Project Properties of the CUDA sample program, Project → Properties → Configuration Properties → CUDA C/C++ → Device, in the Code Generation field it includes

“compute_11,sm_11”

as well as “compute_20,sm_20;compute_30,sm_30;compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;”

In the same field for the New Project I started, it had by default “compute_20,sm_20” and no others. Adding “compute_13,sm_13” fixes the problem, and the kernel runs successfully.