Possible VS2010 integration bug?

jdinger · March 7, 2013, 7:45pm

I’m having an issue creating a static library containing Cuda code in Windows 7, using Visual Studio 2010. I’m using Cuda 5.0, which allows relocatable device code.

I have created a simple VS2010 solution that replicates the issue I’m having. The solution contains
two projects. One is an application project, which has a single .cpp file containing a main function.

//main.cpp
#include <iostream>
#include "../cuda_separate_library/test_kernel.cuh"

int main()
{
   int a[100];
   int b[100];
   int out[100];
   for(int i = 0; i < 100; i++)
   {
      a[i] = i+5;
      b[i] = i*4-2;
   }
   addArrayWrapper(100, out, a, b);

   for(int i = 0; i < 100; i++)
   {
      std::cout << a[i] << " + " << b[i] << " = " << out[i] << std::endl;
   }
}

The other project is a static library project containing two Cuda headers and two Cuda source files. One of the source files contains a kernel function that calls a device function in the other source file. This is not possible without Cuda 5.0 and the -rdc flag on the Cuda compiler.

//test_kernel.cuh:
#ifndef TEST_KERNEL_HEADER
#define TEST_KERNEL_HEADER

void addArrayWrapper(unsigned int size, int* out, int* a, int* b);

#endif // TEST_KERNEL_HEADER

//test_kernel.cu:
#include "test_kernel.cuh"
#include "test_math.cuh"

#include <cuda_runtime_api.h>

__global__ void addArrayKernel(unsigned int size, int* out, int* a, int* b)
{
   int idx = blockIdx.x * blockDim.x + threadIdx.x;
   if(idx < size)
   {
      // Index into array.
      int *aIndex = a+idx;
      int *bIndex = b+idx;
      int *outIndex = out+idx;

      add(*outIndex, *aIndex, *bIndex);
   }
}

void addArrayWrapper(unsigned int size, int* out, int* a, int* b)
{
   if(size > 256) size = 256;

   // Allocate device memory.
   int * aDevice;
   int * bDevice;
   int * outDevice; 
   cudaMalloc(&aDevice, size*sizeof(int));
   cudaMalloc(&bDevice, size*sizeof(int));
   cudaMalloc(&outDevice, size*sizeof(int));

   // Copy input array to device memory.
   cudaMemcpyAsync(aDevice, a, size*sizeof(int), cudaMemcpyHostToDevice);
   cudaMemcpyAsync(bDevice, b, size*sizeof(int), cudaMemcpyHostToDevice);

   // Launch kernel.
   addArrayKernel<<<1,256>>>(size, outDevice, aDevice, bDevice);

   // Copy output array to host memory.
   cudaMemcpy(out, outDevice, size*sizeof(int), cudaMemcpyDeviceToHost);

   // Free device memory.
   cudaFree(aDevice);
   cudaFree(bDevice);
   cudaFree(outDevice);
}

//test_math.cuh:
#ifndef TEST_MATH_HEADER
#define TEST_MATH_HEADER

__device__ void add(int& out, const int& a, const int& b);

#endif // TEST_MATH_HEADER

//test_math.cu
#include "test_math.cuh"

__device__ void add(int& out, const int& a, const int& b)
{
   out = a + b;
}

All the Cuda source files compile correctly. The separate object files are sent to Lib.exe to create a library using the following command:
Lib.exe /OUT:“L:\cuda_separate_compile_test\Debug\cuda_separate_library.lib” /NOLOGO Debug\test_kernel.cu.obj Debug\test_math.cu.obj “Debug\cuda_separate_library.device-link.obj”
This fails with the following error:
LINK : fatal error LNK1181: cannot open input file ‘Debug\cuda_separate_library.device-link.obj’
cuda_separate_library.device-link.obj is supposed to be the output of the Cuda Linker, which doesn’t appear to even get invoked in this case. However, if I temporarily switch the library project
to an application and rebuild, the Cuda Linker gets invoked, creating cuda_separate_library.device-link.obj properly, before failing when it gets passed to the Visual Studio linker. When I switch the project back to a static library and build, Lib.exe is able to find the device-link.obj file (created during the application build) and everything is happy. I can then build my executable (the other project) that links against the static library and it all works.

I can, alternatively, copy the add() function into the kernel code directly, turn off the -rdc switch,
and everything is happy again. However, this defeats the whole purpose of having reusable device code
in a separate file.

It appears that the integration between Cuda and VS2010 isn’t working correctly for static library
builds. I’ve examined the properties sheet provided for VS integration (CUDA 5.0.props), and making the following change seems to fix the issue (either change in the provided props sheet, or make the change in another imported props sheet):
On line 7,

<CudaLinkBeforeTargets>Link</CudaLinkBeforeTargets>

becomes

<CudaLinkBeforeTargets>Link;Lib</CudaLinkBeforeTargets>

This forces the Cuda Linker to run before the Lib command that creates the static library, in addition to running the Cuda Linker before the Link command that links an application. This seems to work, but I’m not particularly well versed in how VS targets work. Is this a good idea? Is there a better way to fix this?

Thanks.

Topic		Replies	Views
Problem using CUDA 5.0 Linker CUDA Programming and Performance	7	3924	January 17, 2013
How to create a static lib using cuda 5.0-6.5 and VS2010 (problem solved and bug found) CUDA Programming and Performance	15	14038	January 21, 2015
Visual Studio 2010: Link error CUDA Programming and Performance	4	7879	March 11, 2011
Linking error building simple application using CUDA driver API driver, linking error CUDA Programming and Performance	0	2306	August 19, 2010
Compiling Cuda Example on Visual Studio 2010 my experience CUDA Programming and Performance	17	40015	August 26, 2011
CUDA static library not linking in VS2013 project. CUDA Programming and Performance	1	1878	March 17, 2016
add other projects at CUDA solution(VS2005) CUDA Programming and Performance	7	4019	April 23, 2009
WIN32 XP LNK2019 Error CUDA Programming and Performance	0	5463	February 16, 2011
Cuda and MFC CUDA Programming and Performance	3	5492	March 2, 2010
Link can't find Win32.obj CUDA Setup and Installation	6	1963	June 5, 2016

Possible VS2010 integration bug?

Related topics