CMake don't compile Cuda kernels

I created a shared library that use OptiX. Now I want to add some computation on the GPU using Cuda.

As a starting point I created this simple kernel, in a file called “sort_cuda.cu”:

#include <cuda_runtime.h>

#include <iostream>

__global__ void testKernel(optix::float3* cudaBuf) {

  cudaBuf.x+=1.0f;
  cudaBuf.y+=1.0f;
  cudaBuf.z+=1.0f;
}

bool runKernel() {
  optix::float3 ret;
  testKernel<<<1, 1>>>(cudaBuf);
  cudaMemcpy( &ret,
              cudaBuf,
              sizeof(optix::float3),
              cudaMemcpyDeviceToHost);
  std::cout << "Ret: "  << ret.x << " , "
                        << ret.y << " , "
                        << ret.z << std::endl;
}

When I compile the whole project, I’ve got an undefined reference to “runKernel”.

I compared with an OptiX sample using Cuda (optixRaycasting), and I saw when compiling the sample there is this line:

[ 83%] Building NVCC (Device) object lib/ptx/./optixRaycasting_generated_optixRaycastingKernels.cu.o

While when I compile my project I have nothing similar, it seems my file “sort_cuda.cu” is not taken into account (if it was, it won’t be able to comple “runKernel” as it is)! I don’t understand what I’m missing.

Here is my library part in my “CMakeList.txt”:

set_source_files_properties(${CMAKE_CURRENT_SOURCE_DIR}/sort_cuda.cu
  PROPERTIES CUDA_SOURCE_PROPERTY_FORMAT OBJ
    )

# Create RayGen library
add_library(RayGen SHARED
    file1.cpp
    file2.cpp
    file3.cpp
    file4.cpp
    sort_cuda.cu
    triangle_mesh.cu
    )

# Link openmp to RayGen library
target_link_libraries(RayGen
    -fopenmp
    optix
	sutil_sdk
    ${optix_rpath}
    ${CUDA_LIBRARIES}
    )

For information I’m under CentOs 7, with OptiX 6.0 and Cuda 10.2

When I compile the whole project, I’ve got an undefined reference to “runKernel”

Did you write a header with that function?
Like it’s done with the three host functions in optixRaycastingKernels.h and optixRaycastingKernels.cu of the optixRaycasting example.

Since you’re building a library, should that function be accessible from the application linking that library?
If yes, is that function exported by the library?

Note that the OptiX examples use the CMake function OPTIX_add_sample_executable to generate the executable.
That does a lot more than just add_executable() or add_library() in your case.
Specifically it generates the custom build rules for all CUDA sources in the project and takes the previously set options into account. Means the OptiX device code is only compiled to PTX and the native CUDA kernels should be compiled to CUBINS and PTX if there is no binary code for each potential streaming multiprocessor version.
Search for CUDA_WRAP_SOURCES in the *.cmake files.

Your code excerpts don’t contain that part. How would that happen in your CMake project?

Generally I would recommend to not use OptiX 6.0.0 anymore. There is already OptiX 6.5.0 and even that is no match for the flexibility and performance of the new API in the OptiX 7 versions.
There is no special CUDA interop in that required anymore because the host code is using CUDA for all allocations already.
There are no Buffers in OptiX 7, those are all CUdeviceptr which makes running native CUDA kernels on the same data OptiX 7 uses straightforward.

There is a different, more explicit method to run native CUDA kernels with the CUDA Driver API and PTX input.
That makes this method compatible across GPU architectures because the code is JIT compiled.
I’m using that in my OptiX 7 examples:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo3/src/DeviceMultiGPULocalCopy.cpp#L58
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo3/src/DeviceMultiGPULocalCopy.cpp#L321

Thanks for your answer and all your advises.

In the “root” CMakList.txt I copied the function “OPTIX_add_sample_executable” to create a “OPTIX_add_library”, and it seems to wrok. But I have a weird problem now that I did not have before adding Cuda.

Let say, in a file “Inter.(cpp|h)” I have two classes “Face” and “Inter”, who both have a method called “get_inter”, like this:

Face {
public:

  Inter* get_inter(optix::float3 coord);
};

Inter {
public:

  Inter* get_inter(optix::float3 coord);
};

Before adding Cuda I had no problem with those methods!!! But now, the linker can not find the method “Face::get_inter”!

I looked into the .o and .so files with objdump command.

Here is the methods in the file “Inter.cpp.o” (only their headers):

0000000000000a5a <_ZN4Face9get_interEN5optix6float3E>:

0000000000000d5e <_ZN5Inter9get_interEN5optix6float3E>:

And in the “libRayGen.so” file:

0000000000048e10 <_ZN4Face9get_interE6float3@plt>:
   48e10:       ff 25 ea 38 24 00       jmpq   *0x2438ea(%rip)        # 28c700 <_ZN4Face9get_interE6float3>
   48e16:       68 dd 02 00 00          pushq  $0x2dd
   48e1b:       e9 10 d2 ff ff          jmpq   46030 <.plt>

000000000004c1e0 <_ZN5Inter9get_interEN5optix6float3E@plt>:
   4c1e0:       ff 25 02 1f 24 00       jmpq   *0x241f02(%rip)        # 28e0e8 <_ZN5Inter9get_interEN5optix6float3E@@Base+0x21fbc0>
   4c1e6:       68 1a 06 00 00          pushq  $0x61a
   4c1eb:       e9 40 9e ff ff          jmpq   46030 <.plt>


000000000006e224 <_ZN4Face9get_interEN5optix6float3E>:

000000000006e528 <_ZN5Inter9get_interEN5optix6float3E>:

As we can see, in the .o file, both methods are correct and have “5optix6float3” argument (I guess it corresponds to "optix::float3).

In the .so, the second part, which seems to correspond to the methods definition, is correct. But in the first part, who seems to correspond to the declaration, is not correct for the Face::get_inter. Indeed, it only has a “6float3” argument, while the Inter’s one still have a “5optix6float3” argument!

From what I understand, the compiler remove the name space part of the type for the method declaration, and then the linker looks for the definition with the truncated type. But, the compiler do that only, and always, for this method, while I’ve got many other methods/functions who have optix data types!

And when I revert the change until their is no more Cuda at all in the project, the error disappear!

Do you have any idea of the problem’s origin?

Thanks,
Arnaud

I cannot say why one function would use the optix:: namespace and the other wouldn’t from the given code excerpts.

First, always include the CUDA headers (cuda.h, cuda_runtime.h) before the OptiX headers (optix.h, etc.) inside the host code. Check each of your source files!

The OptiX headers before version 7.0.0 try to lift the CUDA vector types into the optix:: namespace (in OptiX SDK 6.0.0\include\optixu\optixu_vector_types.h) and that isn’t working when the headers are included in the wrong order. There have been issues with this in the past.
E.g. in your initial example code how does this void testKernel(optix::float3* cudaBuf) even compile when not including optix.h as well after the cuda_runtime.h?

You could also try to not use the optix:: vector types and replace them with the CUDA native versions.
I see no need to use optix:: vectors in native CUDA kernels.

Just an idea, make sure to have “–relocatable-device-code=true” set in your NVCC compile time options or functions which are not called inside a module (like OptiX callable programs) will be eliminated as dead code since CUDA 8.0.

Or switch to OptiX 7 where only the native CUDA vector versions exist.

1 Like

I added cuda.h and cuda_runtime.h in my inter.cpp file (while there is no Cuda at all, only OptiX stuff).
Adding cuda.h changed nothing. Adding cuda_runtime.h or both give me lot of errors like (just some):

/usr/local/cuda/include/driver_types.h:2266:5: erreur: ‘dim3’ does not name a type
     dim3 gridDim;        /**< Grid dimentions */
     ^
/usr/local/cuda/include/driver_types.h:2267:5: erreur: ‘dim3’ does not name a type
     dim3 blockDim;       /**< Block dimentions */

/usr/local/cuda/include/vector_functions.h:73:27: erreur: ‘char1’ does not name a type
 __VECTOR_FUNCTIONS_DECL__ char1 make_char1(signed char x);
                           ^
/usr/local/cuda/include/vector_functions.h:75:27: erreur: ‘uchar1’ does not name a type
 __VECTOR_FUNCTIONS_DECL__ uchar1 make_uchar1(unsigned char x);

I’m not sure to understand your question about the testKernel. It’s includes are:

#include <cuda.h>
#include <cuda_runtime.h>

#include <optixu/optixu_math_namespace.h>

If I change optixu_math_namespace.h by optixpp_namespace.h, or if I include both, I have lot of errors like:

/path/to/OptiX/include/optixu/../optix_cuda_interop.h(140): error: variable "RTAPI" has already been defined

/path/to/OptiX/include/optixu/../optix_cuda_interop.h(140): error: expected a ";"

/path/to/OptiX/include/optixu/../optix_cuda_interop.h(176): error: variable "RTAPI" has already been defined

/path/to/OptiX/include/optixu/../optix_cuda_interop.h(176): error: expected a ";"

/path/to/OptiX/include/optixu/optixpp_namespace.h(126): error: identifier "RTobject" is undefined

/path/to/OptiX/include/optixu/optixpp_namespace.h(228): error: identifier "RTcontext" is undefined

I removed the OptiX name space in testKernel file and Inter.cpp, and the OptiX includes, and it works know! Thanks.

From what I see, the option –relocatable-device-code=true is not set. How do I set it in the CMakeList.txt?

I’m pretty sure that OptiX 7 is great, but the projet ahs to be ended at the end of the month, I can’t start a switch now :s

Compare that with what you initially posted at the top. That’s why reproducers need to be complete.

From what I see, the option –-relocatable-device-code=true is not set. How do I set it in the CMakeLists.txt?

Please have a look into this post which explains how to dump the exact NVCC command line per *.cu file.
https://forums.developer.nvidia.com/t/how-would-you-generally-compile-and-run-a-program-which-has-optix-without-make-or-cmake/75302/2
The run_nvcc.cmake file exists inside the OptiX SDK 6.0.0 as well.
You want to change the nvcc_flags CMake variable, which is constructed in the CUDA_WRAP_SOURCES function again.
The shortcut for --relocatable-device-code=true is -rdc.

I’m not using any of that SDK framework in my own OptiX applications (for multiple reasons) but use the FindCUDA.cmake as provided by CMake (currently onn 3.17.3) and use this CMake macro to generate the custom build rules for my CUDA files. It’s much easier to control than the OptiX SDK abstraction.
Implementation: https://github.com/NVIDIA/OptiX_Apps/blob/master/3rdparty/CMake/nvcuda_compile_ptx.cmake
Usage: https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo3/CMakeLists.txt#L163
See how I set the NVCC_OPTIONS argument of that macro there, including the relocatable-device-code option.

It’s also pretty easy to derive an “NVCUDA_COMPILE_CUBIN” version from that by renaming variables from “PTX” to “CUBIN”, using a .cubin suffix instead of .ptx at the output name, and using --cubin instead of --ptx in the COMMAND ${CUDA_NVCC_EXECUTABLE}.

1 Like