Undefined reference to a CUDA function in a dll (MSVC 2017 + MingW)

Good morning, all.

I’m using MSVC 2017 to create a DLL with the CUDA code, and MingW64 (in a MSYS2 environment) for the rest of the program. This for the sake of portability, the thing will be used in Windows and Linux.

Just to get the grasp of linking CUDA and DLL code with the application, the DLL_Test.cuh and DLL_Test.cu have the following code respectively (and nothing else):

extern "C"
	{
	__declspec(dllexport) __global__ void Test_Func(void);
	}
#include "DLL_Test.cuh"
#include <stdio.h>

__global__ void Test_Func(void)
{
	printf("Printing from the GPU\n");
}

Then I compile this with the following command without any issues (apparently):

nvcc -o DLL_Test.dll --shared DLL_Test.cu

Creating library DLL_Test.lib and object DLL_Test.exp

The dll is created along with these files. In the main program, among other things, I have:

#include <iostream>

__declspec(dllimport) void Test_Func(void);

int main(int argc, char **argv)
{
    ...
    Test_Func();

    ....
}

Then I compile the program with:

g++ -L. -lDLL_Test -o App_with_CUDA App_with_CUDA.cpp

The process ends with:

undefined reference to `__imp__Z10Test_Funcv'
collect2.exe: error: ld returned 1 exit status

However, if I comment out the function so the main program doesn’t call it (line #8 in the main() code), it compiles without any problem. So I am missing something along the way.

  • Does my DLL generation with the CUDA code look correct, or there is something else to be done with these 2 other files (lib and exp) that I left out?
  • Do you spot any other issue?

Thanks a lot.

To complement the above problem, if I configure the function as:

Test_Func <<<1, 1>>> ();

Then the compilation fails as:

App_with_CUDA.cpp:24:16: error: expected primary-expression before '<' token
   Test_Func <<<1, 1>>> ();
                ^
App_with_CUDA.cpp:24:23: error: expected primary-expression before '>' token
   Test_Func <<<1, 1>>> ();
                       ^
App_with_CUDA.cpp:24:26: error: expected primary-expression before ')' token
   Test_Func <<<1, 1>>> ();

So I rewrote the original DLL_Test files, which contain the CUDA function declaration and definition, so now it has a wrapper function. The .cuh and .cu are respectively:

extern "C"
	{
	__global__ void cudaprint(void);
	__declspec(dllexport) void Test_Func(void);	// Is now a wrapper function
	}
#include "DLL_Test.cuh"
#include <stdio.h>

__global__ void cudaprint(void)
{
	printf("Printing from GPU\n");
}

void Test_Func(void)
{
	cudaprint <<<1, 1>>> ();
}

The dll is generated without warnings, but the compilation provides the very same error as the first implementation:

App_with_CUDA.cpp:(.text+0x131): undefined reference to `__imp__Z10Test_Funcv'
collect2.exe: error: ld returned 1 exit status

“The dll is generated without warnings, but the compilation provides the very same error as the first implementation:”

By compilation I meant the main program compilation, not the dll generation.

First issue:
You can’t call a kernel (or any other kind of CUDA device code) across a shared library interface. This is covered in the nvcc manual. Calling CUDA code in a library (i.e. calling a function marked with global or device) requires static linking to the library.

Build ordinary C/C++ wrapper functions in your DLL, that don’t have global or device decorators, and call those across the DLL interface. Inside your DLL, those functions are able to call CUDA kernels that are linked internally to the DLL.

Second issue:
You are attempting to link a binary object (DLL produced by nvcc) with minGW on windows. That is a completely untested configuration by NVIDIA, and it’s quite possible it doesn’t work. However, it appears that your DLL is exporting an unmangled wrapper function name, but it appears that your application is attempting to link against a mangled name. Do you need to wrap your declspec in the application file with extern “C” (like it is wrapped in your .cuh ) ?

Good afternoon txbob and thanks for your reply.

I have read many of your posts around similar issues and was a bit unsure of what you (and other references) meant by “can’t call a device code from a shared library”. Was it literally just device code, any CUDA code? On another post you mentioned the wrapper function to a CUDA method, and maybe I misunderstood again: is it a wrapper function in the dll itself, that is exposed to the main program and calls the CUDA code? Can you elaborate just a bit more on that, as I didn’t understand where the CUDA code should be?

As for the extern “C” preceding the DLL function in the main program, guess what? Adding it made the program compile without a single warning. However, it doesn’t really print the message on the terminal, so now I have to investigate a bit more.

I perfectly understand this is untested and unsupported, and the only reason I’m doing it is that the program interface is built on GTKmm, so I could compile the stuff with minimum modifications in Linux and Windows (precisely a seismic attribute that I am testing for my company, though I’m not speaking on its behalf now).

The other thing I did was just having a full program compiled with nvcc and called from my mingw part with system(“command name and parameters”). It works, of course, but because I am benchmarking things, this call adds some 3 seconds to the execution, though in a 1h run it doesn’t make a difference but it also masks the performance gain in a short run.

Do you think I will hit a wall at some point, and should give up this nvcc + mingw attempt?

Just updating, it won’t show on the console the message with printf, so it is printing the message somewhere else.

The shared library thing is pretty simple.

Your shared library publishes some functions that are callable from outside the library, right? If any of those functions are decorated with global or device, it won’t work.

You’ve already fixed that issue when you switched to publishing/using wrapper functions.

If you’re not seeing any printout, the first thing I would try is to change this:

void Test_Func(void)
{
	cudaprint <<<1, 1>>> ();
        cudaDeviceSynchronize();  // add this line
}

may be it’s better to reverse the beast - put your main program into dll? even more, my friend tried to static link msvc and gcc code together and seems it’s possible, but dll approach will be simpler for starter

Thanks for the assistance once again, txbob.
This fixed the thing!! As we say, the devil lives in the details.

This pretty much solves the original problem I had I can continue to work (and rewrite stuff). However, I’d like to ask you just one more thing: can you point me a good reference to read about the kernel function parameters, as we used <<<1, 1>>> for the sake of simplicity?

I’d like to know how to estimate the optimum numbers and also how to properly use the threadIdx.x .y .z…

Other than that, I can only thank you.

@BulatZiganshin, I thought of other things, like what you suggested and also do everything in Visual C++ (which I will certainly do the next time). But whatever I chose, it messes up my life in Linux, which is main my target while Windows is a plus. But it is a learning curve as I’m new to CUDA and am not very aware of what are the current limitations. But thanks to you too.

These are questions that pertain to elementary CUDA programming. There are many resources available.

If you google “gtc cuda intro” you will find good resources in the first 5-10 hits.

For a longer read, the CUDA C programming guide covers all the necessary topics:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

Thanks again and have a great day!