unresolved external symbol _main referenced in function ___tmainCRTStartup

I am struggling to get my 1st CUDA program to build with Visual Studio 2008 under Windows 7. It compiles OK but gives the linking error:

1>LIBCMTD.lib(crt0.obj) : error LNK2019: unresolved external symbol _main referenced in function ___tmainCRTStartup

The code is taken from the “My first CUDA program” tutorial:

// CUDAVB2008Test.cpp : Defines the entry point for the console application.

#include “stdafx.h”
#include <stdio.h>
#include <cuda.h>
// Kernel that executes on the CUDA device
global void square_array(float *a, int N)
{ int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx<N) a[idx] = a[idx] * a[idx];
}
// main routine that executes on the host
int main(void)
{ float *a_h, *a_d; // Pointer to host & device arrays
const int N = 10; // Number of elements in arrays
size_t size = N * sizeof(float);
a_h = (float *)malloc(size); // Allocate array on host
cudaMalloc((void **) &a_d, size); // Allocate array on device
// Initialize host array and copy it to CUDA device
for (int i=0; i<N; i++) a_h[i] = (float)i;
cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
// Do calculation on device:
int block_size = 4;
int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
square_array <<< n_blocks, block_size >>> (a_d, N);
// Retrieve result from device and store it in host array
cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
// Print results
for (int i=0; i<N; i++) printf(“%d %f\n”, i, a_h[i]);
// Cleanup
free(a_h); cudaFree(a_d);
}

Can anyone steer me on where to go from here?

You should change the file extension .cpp into .cu to ensure using the nvcc compiler.

The file extension is .cu contrary to what the the comment statement says.

You really do have to rename the input to have .cu extension. nvcc uses the file extension to determine how to process the input code. It will not correct parse and compile the device code in your input file unless it has the extension .cu.

The input file name does have the extension .cu

I just didn’t change the extension in the comment which has no effect

If try to compile this following code:

// cuda_example3.cu : Defines the entry point for the console application.

//

#include "stdafx.h"

#include <stdio.h>

#include <cuda.h>

// Kernel that executes on the CUDA device

__global__ void square_array( float *a, int N )

{

    int idx = blockIdx.x * blockDim.x + threadIdx.x;

    if ( idx < N )

        a[idx] = a[idx] * a[idx];

}

// main routine that executes on the host

int main( void )

{

    float *a_h, *a_d; // Pointer to host & device arrays

    const int N = 10; // Number of elements in arrays

    size_t size = N * sizeof( float );

    a_h = (float *)malloc( size );    // Allocate array on host

    cudaMalloc( (void **)&a_d, size ); // Allocate array on device

    // Initialize host array and copy it to CUDA device

    for ( int i = 0; i < N; i++ )

        a_h[i] = (float)i;

    cudaMemcpy( a_d, a_h, size, cudaMemcpyHostToDevice );

    // Do calculation on device:

    int block_size = 4;

    int n_blocks   = N / block_size + ( N % block_size == 0 ? 0 : 1 );

    square_array <<< n_blocks, block_size >>> ( a_d, N );

    // Retrieve result from device and store it in host array

    cudaMemcpy( a_h, a_d, sizeof( float ) * N, cudaMemcpyDeviceToHost );

    // Print results

    for ( int i = 0; i < N; i++ )

        printf( "%d %f\n", i, a_h[i] ); // Cleanup

    free( a_h );

    cudaFree( a_d );

}

I get the same error after got updated to CUDA Toolkit 3.2:

What else can I try?

I have no experience on Windows, but leaving out the [font=“Courier New”]#include “stdafx.h”[/font] looks like a hot candidate.

It seems that some important changes have been made in toolkit version 3.2, isn’t it?
Is it mandatory that host code should be place in cpp and kernels should be saved in cu?