Problem using CUDA 5.0 Linker

Hi

I am trying to use the new Cuda Linker which was introduced in Cuda 5.0 to build a static lib. Actually I am not sure if the linker is able to do this at all, because the documentation ( http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#code-changes-for-separate-compilation ) only speaks of executables.

So far I have set up a small Visual Studio 2010 project with some dummy classes and tried to compile it as executable. This worked. When I try to compile it as static lib, I get the error message

LINK : fatal error LNK1181: Input file "Debug\TestLib.device-link.obj" cannot be opened.

Then I set up my own MakeFile. Then I have been able to build a lib, but when I try to link it to an executable, I get following error for nearly every method I have written:

main.obj : error LNK2019: Unresolved external symbol ""public : unsigned int __cdecl TestLoader::getCount(void)" (?getCount@TestLoader@@QEAAIXZ) " in Function ""class TestLoader * __cdecl getModel(char const *)" (?getModel@@YAPEAVTestLoader@@PEBD@Z)".

Is there a solution for this, or is the linker not able to link device functions to static libs?

Hello

I made some progress with my problem. http://developer.download.nvidia.com/GTC/gpu-object-linking.pdf claims that Separate Compilation works on Windows, Linux and MacOS. It also should work for static libs.

I created a very small test application and tried to build it using the cuda compiler.

To make it short: On Linux the compilation and execution works as expected. On Windows the compilation works but I get an “invalid device function” when executing the application.

The problem seems that the device code is not linked properly into the created library.

Is there a solution for this problem available or is it a bug in the current Cuda release?

Kind Regards

Exported Symbols in test.lib

//Dump of file test.lib
File Type: LIBRARY

  Summary

          10 .CRT$XCU
          18 .bss
          D0 .debug$S
         11E .drectve
          30 .nvFatBinSegment
         E40 .pdata
        146C .rdata
        E3AA .text
         984 .xdata
          64 __nv_module_id
         A28 __nv_relfatbin

//MAKEFILE
NVCC_FLAGS=-arch=compute_20 -code=sm_20
SRC=src/
SRC_CU=$(wildcard $(SRC)*.cu)
SRC_CPP=$(wildcard $(SRC)*.cpp)
LIB_NAME=test

ifeq ($(OS),Windows_NT)
	EXT_OBJ=obj
	EXT_LIB=lib
	LIB_FILE=$(LIB_NAME).$(EXT_LIB)
	RM=del
	EXECUTABLE=default.exe
else
	EXT_OBJ=o
	EXT_LIB=a
	LIB_FILE=lib$(LIB_NAME).$(EXT_LIB)
	RM=rm
	EXECUTABLE=default
endif

$(EXECUTABLE): $(SRC_CPP) $(LIB_FILE)
	nvcc $(NVCC_FLAGS) -rdc=true -L./ -l$(LIB_NAME) -o $@ $(SRC_CPP) 

%.$(EXT_OBJ): $(SRC)%.cu
	nvcc $(NVCC_FLAGS) -dc $^

%.$(EXT_LIB): $(subst $(SRC),,$(subst cu,$(EXT_OBJ),$(SRC_CU)))
	nvcc -lib $^ -o $@

clean: $(wildcard *.$(EXT_OBJ)) $(wildcard *.$(EXT_LIB)) $(EXECUTABLE)
	$(RM) $^

//main.cpp
#include "a.h"

int main(int argc, char** argv) {
	doit();	
	return 1;
}

//b.h
#ifndef __B_H
#define __B_H
__device__ int testB(void);
#endif

//b.cu
#include "b.h"
__device__ int testB(void) {
	return 5;
} 

//a.h
#ifndef __A_H
#define __A_H
void doit(void);
#endif

//a.cu
#include "a.h"
#include "b.h"
#include <cuda.h>
#include <stdio.h>

#define cuExtError(err, file, line) printf("%s in %s at line %d
", cudaGetErrorString(cudaGetLastError()), file, line);
#define cuError(err) cuExtError(err, __FILE__, __LINE__)

__device__ int testA(void) {
	return 3;
}

__global__ void kernel(int * ptr) {
	*ptr = testA() + testB();
}

void doit(void) {
	int* ptr;
	int num = 0xDEADBEEF;
	cudaMalloc(&ptr, sizeof(int));
	kernel(ptr);

	cuError(cudaSynchronize());
	cudaMemcpy(&num, ptr, sizeof(int), cudaMemcpyDeviceToHost);

	printf("Result: %i
", num);
}

Given that it works under Linux but not Windows, I would suggest filing a bug through the registered developer website.

Same here, I can’t link to my own code defined in a separate cu file in VS 2010. That uh, sort of puts a damper on development.

I have not been able to solve it, but I think I know what the problem is:

You can’t call a device or global function defined in one file, from a device or global function defined in another file.

Try this and you’ll get the linker error:

File1.cuh

__device__ void Func1();

File1.cu

#include "File1.cuh"

__device__ void Func1()
{
printf("boo hoo");
}

File2.cuh

__device__ void Func2();

Fil2.cu

#include "File1.cuh"
#include "File2.cuh"

__device__ void Func2()
{
Func1();//Will compile, but gives unresolved external linker error
}

Please tell me nvcc is able to compile more than one file. If not, I guess the workaround is to put all cuda code in a single file, which means it’s going to be several thousand lines long.

Bump…?

I figured out the answer for anyone interested.

The .cuh files are the problem. They simply don’t work, so remove them from your project and only use .cu files.

If you are getting multiple symbol definition linker errors, put #pragma once at the top of your .cu files. Also, be sure to mark all functions that aren’t class member functions as static. Doing both of those solved my problem and I am now able to build my project with many .cu files.

But has anyone figured out the initial question?! I am trying to make a fully working project under Linux also work with Windows 7 and VisualStudio 2010. No luck so far. I need to use separate compilation/linking due to some Dynamic Parallelism code, which requires this.

I also get the “LINK : fatal error LNK1181: Input file “DebugTestLib.device-link.obj” cannot be opened.” error message.

I have not set up my own Makefile, would really like this to work with the IDE.

Anybody found a solution for this yet?