create dll under visual studio 2010

Hi there,
I am trying to generate a dll cuda file in VS 2010, but I get an error at the linking stage.

The steps to create the dll are:

  1. Make a new CUDA project (CUDA version 6.5) and add the main file (see below)
  2. Add a new CUDA project, add the header and function files and set the Configuration Type as dynamic library (.dll)
  3. Add the path of the header file in Additional Include Directory properties of the CUDA project main application. Same for the compiled library path: I add it in the Additional Library directories plus in the Input section I add the cudaDllModule.lib file.

Here my simple test files:

Header file: CUDASommaDiNumeri.h
//-----------------------------------------------------------------------------
#ifndef CUDA_SOMMA_DI_NUMERI
#define CUDA_SOMMA_DI_NUMERI

__declspec ( dllexport) device float CUDAsum(float n1, float n2);

#endif
//-----------------------------------------------------------------------------

Function file: CUDASommaDiNumeri.cu
//-----------------------------------------------------------------------------
#include “cuda_runtime.h”
#include “device_launch_parameters.h”

#include “CUDASommaDiNumeri.h”

device float CUDAsum(float n1, float n2)
{
return n1+n2;
}
//-----------------------------------------------------------------------------

Main file: kernel.cu
//-----------------------------------------------------------------------------

#include “cuda_runtime.h”
#include “device_launch_parameters.h”

#include <stdio.h>

#include “CUDASommaDiNumeri.h”

cudaError_t addWithCuda(int *c, const int *a, const int *b, unsigned int size);

global void addKernel(int *c, const int *a, const int *b)
{
int i = threadIdx.x;
c[i] = a[i] + b[i];

float ccc=CUDAsum(6.,8.);
c[i]=int(ccc);

}

int main()
{
const int arraySize = 5;
const int a[arraySize] = { 1, 2, 3, 4, 5 };
const int b[arraySize] = { 10, 20, 30, 40, 50 };
int c[arraySize] = { 0 };

// Add vectors in parallel.
cudaError_t cudaStatus = addWithCuda(c, a, b, arraySize);
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "addWithCuda failed!");
    return 1;
}


printf("{1,2,3,4,5} + {10,20,30,40,50} = {%d,%d,%d,%d,%d}\n",
    c[0], c[1], c[2], c[3], c[4]);

// cudaDeviceReset must be called before exiting in order for profiling and
// tracing tools such as Nsight and Visual Profiler to show complete traces.
cudaStatus = cudaDeviceReset();
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "cudaDeviceReset failed!");
    return 1;
}

return 0;

}

// Helper function for using CUDA to add vectors in parallel.
cudaError_t addWithCuda(int *c, const int *a, const int *b, unsigned int size)
{
int *dev_a = 0;
int *dev_b = 0;
int *dev_c = 0;
cudaError_t cudaStatus;

// Choose which GPU to run on, change this on a multi-GPU system.
cudaStatus = cudaSetDevice(0);
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "cudaSetDevice failed!  Do you have a CUDA-capable GPU installed?");
    goto Error;
}

// Allocate GPU buffers for three vectors (two input, one output)    .
cudaStatus = cudaMalloc((void**)&dev_c, size * sizeof(int));
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "cudaMalloc failed!");
    goto Error;
}

cudaStatus = cudaMalloc((void**)&dev_a, size * sizeof(int));
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "cudaMalloc failed!");
    goto Error;
}

cudaStatus = cudaMalloc((void**)&dev_b, size * sizeof(int));
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "cudaMalloc failed!");
    goto Error;
}

// Copy input vectors from host memory to GPU buffers.
cudaStatus = cudaMemcpy(dev_a, a, size * sizeof(int), cudaMemcpyHostToDevice);
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "cudaMemcpy failed!");
    goto Error;
}

cudaStatus = cudaMemcpy(dev_b, b, size * sizeof(int), cudaMemcpyHostToDevice);
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "cudaMemcpy failed!");
    goto Error;
}

// Launch a kernel on the GPU with one thread for each element.
addKernel<<<1, size>>>(dev_c, dev_a, dev_b);

// Check for any errors launching the kernel
cudaStatus = cudaGetLastError();
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "addKernel launch failed: %s\n", cudaGetErrorString(cudaStatus));
    goto Error;
}

// cudaDeviceSynchronize waits for the kernel to finish, and returns
// any errors encountered during the launch.
cudaStatus = cudaDeviceSynchronize();
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "cudaDeviceSynchronize returned error code %d after launching addKernel!\n", cudaStatus);
    goto Error;
}

// Copy output vector from GPU buffer to host memory.
cudaStatus = cudaMemcpy(c, dev_c, size * sizeof(int), cudaMemcpyDeviceToHost);
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "cudaMemcpy failed!");
    goto Error;
}

Error:
cudaFree(dev_c);
cudaFree(dev_a);
cudaFree(dev_b);

return cudaStatus;

}

The error I get is:

ptxas fatal : Unresolved extern function ‘_Z7CUDAsumff’

Here the full output:
1>------ Build started: Project: CUDADll, Configuration: Release x64 ------
1> Compiling CUDA source file kernel.cu…
1>
1> F:\jony\Work\Development\Test_CUDA\CUDADll\CUDADll>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\bin\nvcc.exe” -gencode=arch=compute_20,code=“sm_20,compute_20” --use-local-env --cl-version 2010 -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\x86_amd64” -IF:\jony\Work\Development\Test_CUDA\CUDADll\dllModule -IF:\jony\Work\Development\Test_CUDA\CUDADll\cudaDllModule -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" --keep-dir x64\Release -maxrregcount=0 --machine 64 --compile -cudart static -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler “/EHsc /W3 /nologo /O2 /Zi /MD " -o x64\Release\kernel.cu.obj “F:\jony\Work\Development\Test_CUDA\CUDADll\CUDADll\kernel.cu”
1> ptxas fatal : Unresolved extern function ‘_Z7CUDAsumff’
1> kernel.cu
1>C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA 6.5.targets(593,9): error MSB3721: The command ““C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\bin\nvcc.exe” -gencode=arch=compute_20,code=“sm_20,compute_20” --use-local-env --cl-version 2010 -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\x86_amd64” -IF:\jony\Work\Development\Test_CUDA\CUDADll\dllModule -IF:\jony\Work\Development\Test_CUDA\CUDADll\cudaDllModule -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include” -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include” --keep-dir x64\Release -maxrregcount=0 --machine 64 --compile -cudart static -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler “/EHsc /W3 /nologo /O2 /Zi /MD " -o x64\Release\kernel.cu.obj “F:\jony\Work\Development\Test_CUDA\CUDADll\CUDADll\kernel.cu”” exited with code 255.
========== Build: 0 succeeded, 1 failed, 2 up-to-date, 0 skipped ==========

Do you have any idea of where the problem is?

Many thanks in advance

Jony

i do not use windows nor vs, so perhaps i am not the person to reply

i find it difficult to determine whether this is a compiler error, or a linker error, and at what point the error occurs - the libary building, or the main program building

the dll as project generally should be able to build on its own
it might be helpful to first attempt to build that, so that you can see what instigates the error - compiler or linker - and at what point - when the library is building, or when the main program is building

What happens if you move the CUDAsum device function in with the other project directly, does it work correctly then? I have never tried making a DLL of just device functions, only with host wrappers to kernels. The error seems to imply that even though you have a device function, saying that you are providing a dll export for a function called CUDAsum (your device function), it is not able to see it. This is probably because it is not a function in the traditional sense. I may be wrong there.

If I have misunderstood your project structure and they are already all there, just try removing the dll export line.

Hi ,thanks for your help. However:

#2) If I build my dll project is all ok. Then, I am quite sure the problem is at the linking stage of the full VS solution

#3) If I move the device function in with the main file is ok, but then I cannot generate a separate dll file which is my task (I basically don’t want to recompile the full code in future, but only the dll separately)

Any other suggestions? Thanks

kindly note the names of your dll project, and your main project; i can not evaluate the statement:

“Add the path of the header file in Additional Include Directory properties of the CUDA project main application. Same for the compiled library path: I add it in the Additional Library directories plus in the Input section I add the cudaDllModule.lib file”

although i would agree to including it in the main project, i am not sure whether it should be included in the dll; ‘it’ being CUDASommaDiNumeri.h

the main project only requires a forward declaration to the dll entry function
the dll does not require a forward declaration of itself
hence, you normally split the main program header, and dll header, to accommodate their different requirements

and look at the output:

Compiling CUDA source file kernel.cu…

1> … -IF:\jony\Work\Development\Test_CUDA\CUDADll\dllModule -IF:\jony\Work\Development\Test_CUDA\CUDADll\cudaDllModule…

1> ptxas fatal : Unresolved extern function ‘_Z7CUDAsumff’

are you not creating circular references by seemingly linking in the dll to itself…?
only the main project should link in the dll - have a linker library include to the dll

just check