CUDA 6.5: No kernel Launch

Hi All,
I just installed CUDA 6.5 on my workstation (built end of 2011). Intel core i7 3.07GHz (8 core), 64-bit and I have an Nvidia Quadro FX3800 GPU. I also installed Nsight for visual studio 2010 Professional edition. Nsight is 4.1 (latest as of a week ago) and I installed the latest driver (340.66). Here’s my problem(s):

1- I am running code to do simple vector addition of two 10-element vectors. Both are float (32 bit) if that matters. The answer I get is a 10-element vector of all zeros. I do not understand why. I added “cudaGetErrorString(cudaGetLastError())” and the error string I get is “no error”. I even deleted the active code in the cuda kernel function, so it is a no op basically and still get the same thing. It doesn’t seem that the kernel is being launched at all. So I’m really confused as to what is causing this. CUDA call placed in the program BEFORE the kernel launch return good values; for example, I am able to get the decive properties structure with correct fields, GPU name, compute capability etc. memory allocation on the device also returns success codes, but the kernel seems to not launch at all!!. I am now wondering if CUDA is supported at all on my GPU? at least CUDA 6.5 is not supported? really lost here.

This brings me to the second question. As a result of the above I tried to do some digging in Nsight, so I compiled the matrix multiplication sample code provided with Nsight install. When I run it with the “Profile CUDA Application” setting, I get “No kernel launches captured!”. I can also see the output error message saying that the result was all zeros similar to what happens in my simple vector code above. So, same question as above: why am I getting zeros? why is the kernel NOT launching?

Third question: I do not see my GPU (Quadro FX 3800) on the “full” list of supported GPUs for Nsight 4.1. What do people typically do about updating their Nsight install? I dont change my GPU often, so I have to stay on an old version of Nsight?

Sorry for the long post, but I tried to give you all the details (I hope). Please help. I appreciate your time, thanks.

vectorAdd is a CUDA sample code. What happens if you compile and run that sample code?

Can you post your vector add code here? (with the error checking)

what happens if you run your code with cuda-memcheck?

Thank you for the quick response and for the suggestion. Trying out vectorAdd has indirectly led me to solving my problem. While poking around this sample code in Visual Studio (2010) I looked at the Solution Properties->Configuration Properties->CUDA C/C+±>Device->Code Generation and notices the numbers “compute_20,sm_20” the list there included 20,30 and 35. So I assumed this is somehow related to the Device compute power which I know on my Quadro FX3800 is 1.3. So I added “compute_11,sm_11” to the top of the list. That, not only made the vectorAdd sample code run correctly, but it also ran correctly with my Nsight 4.1 installation in VS. So, despite the fact that Nvidia documentation does not show my GPU on the “supported” list of GPUs for Nsight 4.1, it actually ran fine and show “301 kernels captured”. Hope this info will help someone who’s lost as I was… i’m just starting on GPU computing, only done reading and minor coding so far.