Problem with linking file

Hi, I have a problem with linking files - I get error LNK2019 unresolved external symbol.

My program looks like this:

----- Bandwidth.cuh -----

Excluded from build: No

Tool: Custom build tools

#ifndef _BANDWIDTH_GPU_CUH_

#define _BANDWIDTH_GPU_CUH_

__global__ void bndCharR(char *src, char *dst);

__global__ void bndCharW(char *src, char *dst);

__global__ void bndCharRW(char *src, char *dst);

...

#endif

----- Bandwidth.cu -----

Excluded from build: Yes

Tool: Custom build tools

#include "Bandwidth.cuh"

__global__ void bndCharR(char *src, char *dst)

{

	char *shr = (char*)arrayShr;

	shr[threadIdx.x] = src[blockIdx.x * blockDim.x + threadIdx.x];

}

__global__ void bndCharW(char *src, char *dst)

{

	dst[blockIdx.x * blockDim.x + threadIdx.x] = threadIdx.x;

}

__global__ void bndCharRW(char *src, char *dst)

{

	dst[blockIdx.x * blockDim.x + threadIdx.x] = src[blockIdx.x * blockDim.x + threadIdx.x];

}

...

----- BandwidthTest.cu -----

Excluded from build: No

Tool: Cuda Runtime API (3.2)

#include "Bandwidth.cuh"

extern "C"

void bndChar(int nrElement, int nrThread, int nrBlock)

{

	srand(time(0));

	int sharedMemory;

	char *srcH, *srcD, *dstD;

	long nrBytes = nrElement * sizeof(char);

	srcH = (char*)malloc(nrBytes);

	for (long i = 0; i < nrElement; i++) 

		srcH[i] = rand()%256;

	cutilSafeCall(cudaMalloc((void**) &srcD, nrBytes));

	cutilSafeCall(cudaMalloc((void**) &dstD, nrBytes));

	cutilSafeCall(cudaMemcpy(srcD, srcH, nrBytes, cudaMemcpyHostToDevice));

	sharedMemory = nrThread * sizeof(char);

	bndCharR <<< nrBlock, nrThread, sharedMemory >>> (srcD, dstD);

	bndCharW <<< nrBlock, nrThread >>> (srcD, dstD);

	bndCharRW <<< nrBlock, nrThread >>> (srcD, dstD);

	

	cudaThreadSynchronize();

	free(srcH);

	cudaFree(srcD);

	cudaFree(dstD);

}

---- main.cpp ----

extern "C"

void bndChar(int nrElement, int nrThread, int nrBlock)

void main()

{

	bndChar(...);

}

Everything compiles OK. However, problem is with linking this files. I know that, I can in BandwidthTest.cu include Bandwidth.cu instead of Bandwidth.cuh, but I need this first form.

I try solve this problem, but without positiv result. Could You help me? What do I wrong?

Thx

You aren’t doing anything wrong, per se. There is no device code linker, so everything must be defined within the scope of a single translation unit. So it is not possible to do what you are trying to do at the moment. The best way to go with the runtime API is probably to move all you kernel code and other device symbols into a header files and then import them into a single .cu file which contains all the kernel invocations and host side support code. Just compile that one .cu file and call wrapper/access function. Which is basically what you have now (if you move the kernel definitions to Bandwidth.cuh).

Thanks for help. Unfortunately, my program has several thousands line of code, so sometimes I have problem with navigation.