CUDA & CPP Interoperability Having trouble compiling...

Hello. I am just beginning to learn CUDA and am having some difficulty getting my projects to compile. I am using VS2010 on Win7 64-bit. I have gotten several test applications running and what I am working on now is getting a simple CUDA kernel to be called from a .cpp.

However, when I compile my program it gives me the following error: (shown below)

[b]1> main.cpp

1>c:\users\santos\documents\visual studio 2010\projects\cuda_opengl\cuda_opengl\test_kernel.cu(6): error C2065: ‘blockIdx’ : undeclared identifier

1>c:\users\santos\documents\visual studio 2010\projects\cuda_opengl\cuda_opengl\test_kernel.cu(6): error C2228: left of ‘.x’ must have class/struct/union

1> type is ‘‘unknown-type’’

1>c:\users\santos\documents\visual studio 2010\projects\cuda_opengl\cuda_opengl\main.cpp(33): error C2059: syntax error : ‘<’

1>

1>Build FAILED.[/b]

Anyway, I’m going to keep working on it, but if you have any insight into what my problem is I would greatly appreciate it. It seems as though every time I make a post in a forum about a problem I’m having I solve it on my own shortly after. Hopefully, that will be the case with this as well. Thanks for your time.

Entire error message:

1>------ Rebuild All started: Project: CUDA_OpenGL, Configuration: Debug Win32 ------

1>Build started 3/22/2011 5:07:28 PM.

1>_PrepareForClean:

1>  Deleting file "Debug\CUDA_OpenGL.lastbuildstate".

1>CudaClean:

1>  

1>  C:\Users\Santos\Documents\Visual Studio 2010\Projects\CUDA_OpenGL\CUDA_OpenGL>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\\bin\nvcc.exe" -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\\include"  -G0  --keep-dir "Debug\\" -maxrregcount=32  --machine 32 --compile  -D_NEXUS_DEBUG -g    -Xcompiler "/EHsc /nologo /Od /Zi  /MDd " -o "Debug\test_kernel.obj" "C:\Users\Santos\Documents\Visual Studio 2010\Projects\CUDA_OpenGL\CUDA_OpenGL\test_kernel.cu" -clean 

1>  Deleting file "Debug\test_kernel.cu.deps".

1>InitializeBuildStatus:

1>  Touching "Debug\CUDA_OpenGL.unsuccessfulbuild".

1>CudaBuild:

1>  Compiling CUDA source file test_kernel.cu...

1>  

1>  C:\Users\Santos\Documents\Visual Studio 2010\Projects\CUDA_OpenGL\CUDA_OpenGL>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2008 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\\include"  -G0  --keep-dir "Debug\\" -maxrregcount=32  --machine 32 --compile  -D_NEXUS_DEBUG -g    -Xcompiler "/EHsc /nologo /Od /Zi  /MDd " -o "Debug\test_kernel.obj" "C:\Users\Santos\Documents\Visual Studio 2010\Projects\CUDA_OpenGL\CUDA_OpenGL\test_kernel.cu" 

1>  test_kernel.cu

1>  tmpxft_000006a4_00000000-0_test_kernel.cudafe1.gpu

1>  tmpxft_000006a4_00000000-5_test_kernel.cudafe2.gpu

1>  test_kernel.cu

1>  tmpxft_000006a4_00000000-0_test_kernel.cudafe1.cpp

1>  tmpxft_000006a4_00000000-11_test_kernel.ii

1>  Deleting file "tmpxft_000006a4_00000000-6_test_kernel.cpp3.o".

1>ClCompile:

1>  main.cpp

1>c:\users\santos\documents\visual studio 2010\projects\cuda_opengl\cuda_opengl\test_kernel.cu(6): error C2065: 'blockIdx' : undeclared identifier

1>c:\users\santos\documents\visual studio 2010\projects\cuda_opengl\cuda_opengl\test_kernel.cu(6): error C2228: left of '.x' must have class/struct/union

1>          type is ''unknown-type''

1>c:\users\santos\documents\visual studio 2010\projects\cuda_opengl\cuda_opengl\main.cpp(33): error C2059: syntax error : '<'

1>

1>Build FAILED.

1>

1>Time Elapsed 00:00:01.53

========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========

This is my main.cpp which calls the kernel.

#include <cuda.h>

#include "cuda_runtime.h"

#include <stdio.h>

#include <iostream>

#include "test_kernel.cu"

#define N 10

int main(){

	cudaDeviceProp  prop;

	int count;

	int a[N], b[N], c[N];

    int *dev_a, *dev_b, *dev_c;

// allocate the memory on the GPU

    cudaMalloc( (void**)&dev_a, N * sizeof(int) );

    cudaMalloc( (void**)&dev_b, N * sizeof(int) );

    cudaMalloc( (void**)&dev_c, N * sizeof(int) );

// fill the arrays 'a' and 'b' on the CPU

    for (int i=0; i<N; i++) {

        a[i] = i;

        b[i] = i;

    }

// copy the arrays 'a' and 'b' to the GPU

    cudaMemcpy( dev_a, a, N * sizeof(int), cudaMemcpyHostToDevice );

    cudaMemcpy( dev_b, b, N * sizeof(int), cudaMemcpyHostToDevice );

add<<<N,1>>>( dev_a, dev_b, dev_c );

// copy the array 'c' back from the GPU to the CPU

    cudaMemcpy( c, dev_c, N * sizeof(int), cudaMemcpyDeviceToHost );

// display the results

    for (int i=0; i<N; i++) {

        printf( "%d + %d = %d\n", a[i], b[i], c[i] );

    }

	std::cin.get();

// free the memory allocated on the GPU

    cudaFree( dev_a );

    cudaFree( dev_b );

    cudaFree( dev_c );

return 0;

}

Finally, here is my kernel. “test_kernel.cu”

__global__ void add(int *a, int *b, int *c)

{

	int tid = blockIdx.x;

	c[tid] = a[tid] + b[tid];

}

If I recall, you can’t call a .cu file (which is a compilable code file) as if it’s a header file. Also, change your .cpp file to a .cu file. Nvidia’s compiler will call the regular .cpp compiler after it parses through the code looking to compile the device specific stuff. So basically take away your .cu file and move that global function into the main program, and rename the main program to be .cu instead of .cpp. If you build it properly, you’ll call NVCC first and then VC++ will call the C-compiler next.

-M

Thank you for your help. It works now. I guess I’m still not really understanding how I go about including additional .cu files into my program. Now that I have a main.cu file, when I try to make an additional test_kernel.cu file with the aforementioned global function included it gives me the following error. Which is confusing because add is no longer defined in main.cu but only called there. I’ll keep working on this, thanks!

1>  test_kernel.cu

1>  tmpxft_0000123c_00000000-0_test_kernel.cudafe1.cpp

1>  tmpxft_0000123c_00000000-11_test_kernel.ii

1>  Deleting file "tmpxft_0000123c_00000000-6_test_kernel.cpp3.o".

1>test_kernel.obj : error LNK2005: "void __cdecl __device_stub__Z3addPiS_S_(int *,int *,int *)" (?__device_stub__Z3addPiS_S_@@YAXPAH00@Z) already defined in main.obj

1>test_kernel.obj : error LNK2005: "void __cdecl add(int *,int *,int *)" (?add@@YAXPAH00@Z) already defined in main.obj

1>C:\Users\Santos\Documents\Visual Studio 2010\Projects\CUDA_OpenGL\Debug\CUDA_OpenGL.exe : fatal error LNK1169: one or more multiply defined symbols found

1>

1>Build FAILED.

I am including it in the project using the line:

#include <test_kernel.cu>