I’m using MSVC 2017 to create a DLL with the CUDA code, and MingW64 (in a MSYS2 environment) for the rest of the program. This for the sake of portability, the thing will be used in Windows and Linux.
Just to get the grasp of linking CUDA and DLL code with the application, the DLL_Test.cuh and DLL_Test.cu have the following code respectively (and nothing else):
undefined reference to `__imp__Z10Test_Funcv'
collect2.exe: error: ld returned 1 exit status
However, if I comment out the function so the main program doesn’t call it (line #8 in the main() code), it compiles without any problem. So I am missing something along the way.
Does my DLL generation with the CUDA code look correct, or there is something else to be done with these 2 other files (lib and exp) that I left out?
So I rewrote the original DLL_Test files, which contain the CUDA function declaration and definition, so now it has a wrapper function. The .cuh and .cu are respectively:
extern "C"
{
__global__ void cudaprint(void);
__declspec(dllexport) void Test_Func(void); // Is now a wrapper function
}
First issue:
You can’t call a kernel (or any other kind of CUDA device code) across a shared library interface. This is covered in the nvcc manual. Calling CUDA code in a library (i.e. calling a function marked with global or device) requires static linking to the library.
Build ordinary C/C++ wrapper functions in your DLL, that don’t have global or device decorators, and call those across the DLL interface. Inside your DLL, those functions are able to call CUDA kernels that are linked internally to the DLL.
Second issue:
You are attempting to link a binary object (DLL produced by nvcc) with minGW on windows. That is a completely untested configuration by NVIDIA, and it’s quite possible it doesn’t work. However, it appears that your DLL is exporting an unmangled wrapper function name, but it appears that your application is attempting to link against a mangled name. Do you need to wrap your declspec in the application file with extern “C” (like it is wrapped in your .cuh ) ?
I have read many of your posts around similar issues and was a bit unsure of what you (and other references) meant by “can’t call a device code from a shared library”. Was it literally just device code, any CUDA code? On another post you mentioned the wrapper function to a CUDA method, and maybe I misunderstood again: is it a wrapper function in the dll itself, that is exposed to the main program and calls the CUDA code? Can you elaborate just a bit more on that, as I didn’t understand where the CUDA code should be?
As for the extern “C” preceding the DLL function in the main program, guess what? Adding it made the program compile without a single warning. However, it doesn’t really print the message on the terminal, so now I have to investigate a bit more.
I perfectly understand this is untested and unsupported, and the only reason I’m doing it is that the program interface is built on GTKmm, so I could compile the stuff with minimum modifications in Linux and Windows (precisely a seismic attribute that I am testing for my company, though I’m not speaking on its behalf now).
The other thing I did was just having a full program compiled with nvcc and called from my mingw part with system(“command name and parameters”). It works, of course, but because I am benchmarking things, this call adds some 3 seconds to the execution, though in a 1h run it doesn’t make a difference but it also masks the performance gain in a short run.
Do you think I will hit a wall at some point, and should give up this nvcc + mingw attempt?
Your shared library publishes some functions that are callable from outside the library, right? If any of those functions are decorated with global or device, it won’t work.
You’ve already fixed that issue when you switched to publishing/using wrapper functions.
If you’re not seeing any printout, the first thing I would try is to change this:
void Test_Func(void)
{
cudaprint <<<1, 1>>> ();
cudaDeviceSynchronize(); // add this line
}
may be it’s better to reverse the beast - put your main program into dll? even more, my friend tried to static link msvc and gcc code together and seems it’s possible, but dll approach will be simpler for starter
Thanks for the assistance once again, txbob.
This fixed the thing!! As we say, the devil lives in the details.
This pretty much solves the original problem I had I can continue to work (and rewrite stuff). However, I’d like to ask you just one more thing: can you point me a good reference to read about the kernel function parameters, as we used <<<1, 1>>> for the sake of simplicity?
I’d like to know how to estimate the optimum numbers and also how to properly use the threadIdx.x .y .z…
Other than that, I can only thank you.
@BulatZiganshin, I thought of other things, like what you suggested and also do everything in Visual C++ (which I will certainly do the next time). But whatever I chose, it messes up my life in Linux, which is main my target while Windows is a plus. But it is a learning curve as I’m new to CUDA and am not very aware of what are the current limitations. But thanks to you too.