CUDA Programs Returning Zero after Update to v6.5

Hi,
I recently migrated my development environment. Before migration, things were working fine. But since migration, I am unable to compile executable and DLLs with expected results. Even running simple vector addition example is returning zero. My old and new development environments are listed below.

Old env: Win 7, x64, CUDA 5.5 with VS2008 with Nsight 3.3 plugin, compilation setting for 32-bit
New env: Win 7, x64, CUDA 6.5 with VS2012 with Nsight 4.2 plugin, compilation setting for 32-bit
My GPU is NVIDIA GEFORCE 330M with the latest device drives that came with CUDA v6.5

The CUDA code that I am trying to compile into an executable is as follows.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
// CUDA kernel - add to vectors element by element
// pass result back in the first vector
__global__ void vecAdd(float *a, float *b, int n)
{
    int tid = blockIdx.x*blockDim.x+threadIdx.x;
    if (tid < n)
        a[tid] = a[tid] + b[tid];
}
// simple test with only 4 elements
int main( int argc, char* argv[] )
{
    // host data
    float *h_a;
    float *h_b;
    // device data
    float *d_a;
    float *d_b;
 
    size_t bytes = 4*sizeof(float);
    // allocate memory on host
    h_a = (float*)malloc(bytes);
    h_b = (float*)malloc(bytes);
    // allocate memory on gpu-device
    cudaMalloc(&d_a, bytes);
    cudaMalloc(&d_b, bytes);
 
    // initialize host data to constants
    for(int i = 0; i < 4; i++) {
        h_a[i] = 0.1;
        h_b[i] = 0.2;
    }
 
    // copy host data to gpu-device
    cudaMemcpy( d_a, h_a, bytes, cudaMemcpyHostToDevice);
    cudaMemcpy( d_b, h_b, bytes, cudaMemcpyHostToDevice);
    
    printf("Before CUDA...\n");
    for(int i=0; i<4; i++)
        printf("a[%d]=%f\n",i,h_a[i]);
 
    // Execute the kernel
    vecAdd<<<1, 4>>>(d_a, d_b, 4);
 
    // Copy array back to host
    cudaMemcpy( h_a, d_a, bytes, cudaMemcpyDeviceToHost );
    
    printf("After CUDA...\n");
    for(int i=0; i<4; i++)
    	printf("a[%d]=%f\n",i,h_a[i]);
 
    // Release device memory
    cudaFree(d_a);
    cudaFree(d_b);
 
    // Release host memory
    free(h_a);
    free(h_b);
 
    return 0;
}

The compilation is done on command prompt using the following command.

nvcc -O3 -o mycudatest.exe mycudatest.cu

I used to see the results in the previous version of CUDA (5.5). Now, all I get is a bunch of zeros.

Any advice or suggestions are greatly appreciated.

Warm regards,
Sam V

The default compute capability setting in CUDA 6.5 is cc2.0. I think in CUDA 5.5 it was still cc1.0. For your GPU you need to add something like -arch=sm_12 to your nvcc call.

You may want to learn how to add proper cuda error checking to your programs, also.

Thanks hadschi118. Add the architecture flags solved the issue. I did not realize that compute_11-13 and sm_11-13 are deprecated in v6.5 and support for these will be removed in future releases. Appreciate your inputs. Problem solved!

txbob, I am not sure what you mean. Please clarify and expand on your response so that people like me can understand. I am a newbie and have not yet come to stage to do full fledged debugging/error checking.

google this:

proper cuda error checking

read the first hit

With proper error checking, your program upon execution would return a message “invalid device function” which, with a little bit of experience, is immediately recognizable as an issue usually relating to a mismatch of the architecture the code was compiled for with the architecture of the GPU you are running on.

But apart from that, it is a useful debugging technique and will allow you to zero in on many more problems than what can be ascertained from “my program didn’t return the correct results”.

You can also run your programs with cuda-memcheck for a quick test.

Thanks txbob, I appreciate your suggestions. I am picking up as I go along. Till now, I have been using rudimentary debugging techniques. But I think for CUDA, its a notch deeper - and I will take some time to warm up to the skills required. I would also be glad if you can point to a resource (website/article/guide) that tells to how one should debug CUDA applications (maybe using something like Nsight). I am having trouble find a concise guide for beginners.

There are lots of resources. google is your friend. There are the nvidia docs at docs.nvidia.com

There are plenty of resources avaialable via GTC express (gputechconf.com, and begin discovering what is there, it is searchable). Finally there are even youtube walkthroughs.

Thanks txbob. Appreciate your suggestions.