Problems compiling gazebo libraries with NVCC

Environment

  • OS Version: 20.04
  • RTX 4090
  • ROS noetic
  • gazebo-11
  • CUDA 12.2
  • In this specific case: math-6

Description

Hello there. I am working on a Simulation of a radar sensor using ROS and gazebo. For the acceleration of some calculations I want to use CUDA. CUDA-code and non-CUDA code are seperated in different files. But both of them contain thesame header file because of gazebo specifc data types that are passed to the kernels. While compiling I got the following error message:

/usr/include/ignition/math6/gz/math/graph/Graph.hh: In member function ‘EdgeType& ignition::math::v6::graph::Graph<V, E, EdgeType>::LinkEdge(const EdgeType&)’:
/usr/include/ignition/math6/gz/math/graph/Graph.hh:249:67: error: expected ‘;’ before ‘}’ token
  249 |       for (auto const &v : {edgeVertices.first, edgeVertices.second})
      |                                                                   ^
      |                                                                   ;
/usr/include/ignition/math6/gz/math/graph/Graph.hh:266:67: error: expected ‘;’ before ‘}’ token
  266 |       for (auto const &v : {edgeVertices.first, edgeVertices.second})
      |                                                                   ^
      |                                                                   ;
/usr/include/ignition/math6/gz/math/graph/Graph.hh: In member function ‘bool ignition::math::v6::graph::Graph<V, E, EdgeType>::RemoveEdge(const EdgeId&)’:
/usr/include/ignition/math6/gz/math/graph/Graph.hh:614:67: error: expected ‘;’ before ‘}’ token
  614 |       for (auto const &v : {edgeVertices.first, edgeVertices.second})

I think NVCC has problems with the syntax {edgeVertices.first, edgeVertices.second} in Graph.hh.

Steps to reproduce

  1. Software as mentioned above
  2. The following configuration in CMakeLists.txt:
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CUDA_STANDARD 17)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g")

set(CUDA_NVCC_FLAGS "-arch=compute_89" CACHE STRING "nvcc flags" FORCE)
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS}; -std=c++17 --expt-relaxed-constexpr")
set(CUDA_NVCC_EXECUTABLE "/usr/local/cuda-12.2/bin/nvcc")
set(CMAKE_CUDA_COMPILER "${CUDA_NVCC_EXECUTABLE}")
set(CUDA_VERBOSE_BUILD ON CACHE BOOL "nvcc verbose" FORCE) 
set(LIB_TYPE STATIC)
add_library(somelibrary SHARED src/test.cpp)
target_include_directories(somelibrary PUBLIC ${GAZEBO_INCLUDE_DIRS})
target_link_libraries(somelibrary ${GAZEBO_LIBRARIES})

cuda_add_library(cuda_lib ${LIB_TYPE} src/test.cu OPTIONS -Xcompiler -fPIC)

target_include_directories(cuda_lib PUBLIC ${CUDA_INCLUDE_DIRS})

target_link_libraries(cuda_lib  
  ${catkin_LIBRARIES}
  ${CUDA_LIBRARIES}
  somelibrary
)

add_executable(Testexe src/test.cpp)
add_dependencies(somelibrary  ${${PROJECT_NAME}_EXPORTED_TARGETS} ${catkin_EXPORTED_TARGETS})
add_dependencies(Testexe  ${${PROJECT_NAME}_EXPORTED_TARGETS} ${catkin_EXPORTED_TARGETS})
target_link_libraries(Testexe somelibrary cuda_lib) 

Output

I am not sure why this error occurs, becuse gcc has no problems compiling my non-CUDA code where math-6 is also included.
It might be a faulty CMakeLists.txt or a bug in NVCC itself.

Possible solution?

I “fixed” the problem by updating Grph.hh. I know it is not the best and recomandable way but in my case it works.

std::vector<decltype(edgeVertices.first)>temp = {edgeVertices.first, edgeVertices.second};

for (auto const &v : temp)
{
if (this->vertices.find(v) == this->vertices.end())
return EdgeType::NullEdge;
}

Is there any other possibility to fix this Problem?

one possible approach, and a common suggestion is don’t include the troublesome headers in .cu files. Partition your code between the things that need gazebo, and the things that need CUDA, into separate files, and use wrapper functions to make the connection between.

Hi, thanks for the reply. The problem here is, that I need the gazebo stuff in the kernels. Furthermore I am passing gazebo data structures directly to the kernels.

suggestion: provided the shortest possible demonstration of the problem that is complete. That means that I could follow your instructions, and then copy, paste, and compile your code, without having to add anything or change anything, and see the issue. I strongly suggest that you provide an example demonstrator that does not depend on CMake in any way. It should be demonstrable using code, plus the NVIDIA provided toolchain, only.

There may well be a front-end incompatibility issue, in which case you would eventually need to file a bug. The above instructions will be needed for bug filing if you desire any forward progress on it. I don’t have any further suggestions based on what you have shown.

In the following I will upload only the CUDA code.

The .h file:

#include <cstdint>
#include <stdio.h>
#include <curand_kernel.h>
#include <cuda_runtime.h>
#include <cuda.h>
#include "otherinclude.h" // This is the header file with the gazebo headers

__global__ void kernel_exe(double x, double y, double z, double res, unsigned long long int *count,  gazebo::physics::RayShapePtr ray, GazeboRosGridmap &obj, uint8_t voxel_grid_3dim[]);

 void call_kernel(double x, double y, double z, double res, unsigned long long int *cuda_count, dim3 blockDims, dim3 gridDims, gazebo::physics::RayShapePtr ray, GazeboRosGridmap &obj, uint8_t voxel_grid_3dim[]);

otherinclude.h

#include <memory>
#include <stack>
#include <Eigen/Core>
#include <Eigen/Dense>
#include <string>
#include <utility>
#include <gazebo/physics/Model.hh>
#include <ros/ros.h>
#include <gazebo/physics/physics.hh>
#include <gazebo/common/common.hh>
#include <gazebo/gazebo.hh>

I am using all these headers in CUDA and non-CUDA code. This header file is included in my .cpp and .cu file.

The .cu file:

#include "my_include.h" //header file mentioned above

   __global__ kernel_exe(double x, double y, double z, double res, unsigned long long int *count,  gazebo::physics::RayShapePtr ray, GazeboRosGridmap &obj, uint8_t voxel_grid_3dim[])
    {
        int threadx = threadIdx.x + blockDim.x * blockIdx.x;
        int thready = threadIdx.y + blockDim.y * blockIdx.y;
        int threadz = threadIdx.z + blockDim.z * blockIdx.z;

        int gridStridex = gridDim.x * blockDim.x;
        int gridStridey = gridDim.y * blockDim.y;
        int gridStridez = gridDim.z * blockDim.z;

        for (double i = threadx; i < x; i += gridStridex)
        {
            for (double j = thready; j < y; j += gridStridey)
            {
                for (double k = threadz; k < z; k += gridStridez)
                {
                    if (is_obstacle(ignition::math::Vector3d(i, j, k), res, ray))
                     {
                          updateNode(i, j, k, true);  
                     }
                }
            }
        }
       atomicAdd(count, 1);
    }

void call_kernel(double x, double y, double z, double res, unsigned long long int *cuda_count, dim3 blockDims, dim3 gridDims, gazebo::physics::RayShapePtr ray, GazeboRosGridmap &obj, uint8_t voxel_grid_3dim[])
    {
        kernel_exe<<<gridDims, blockDims>>>(x, y, z, res);
    }

cuda_count is my counter variable to check how many kernels are executed. gridDims and blockDims are calculated in my .cpp file and the passed to the functions.

The goal would be for someone to be able to compile what you provide, and see the issue.

what about is_obstacle and updateNode?

Here is what I suggested:

Keep working on your example until that is possible.

if the problem comes about simply by inclusion of a header file, then it might be sufficient just to provide a short, complete example like this:

#include <gazebo/common/common.hh>
int main() {}

(or something similar that actually demonstrates the compile issue you are having)

You should also indicate a short set of instructions to set up gazebo (and anything else needed, e.g. ROS if its needed) on Ubuntu.

Have you tried recompiling your code with the latest CUDA version installed (currently 12.4.1, I believe)?

Hello.
Yesterday I updated my CUDA version to 12.4 but the error still occurs. For ROS and gazebo no special configuration is needed (only setting up environment variables). For installation just follow the instructions provided on the websites (ROS noetic installation for ubuntu, gazebo-11 installation).

In the following a minimal example of my code which can be compiled with NVCC only.Nevertheless this changed nothing. The error still occurs.

Header file:

#include <curand_kernel.h>
#include <cuda_runtime.h>
#include <cuda.h>
#include <cstdint>
#include <stdio.h>
#include <memory>
#include <stack>
#include <ros/ros.h>
#include <gazebo/physics/physics.hh>
#include <gazebo/common/common.hh>
#include <gazebo/gazebo.hh>
#include <gazebo/physics/Model.hh>

_global__ void kernel_exe(double x, double y, double z unsigned long long int *foo, float *result, float *a, float *b, float *c);

__device__ bool is_obstacle();

 void call_create_octomap_iter(double x, double y, double z,  unsigned long long int *cuda_count, float *d, float *a, float *b, float *c, dim3 blockDims, dim3 gridDims);

CUDA file:

#include "headerfile.h"
    __global__ void create_octomap_iter1(double x, double y, double z, unsigned long long int *foo, float *result, float *a, float *b, float *c)
    {
        int threadx = threadIdx.x + blockDim.x * blockIdx.x;
        int thready = threadIdx.y + blockDim.y * blockIdx.y;
        int threadz = threadIdx.z + blockDim.z * blockIdx.z;

        int gridStridex = gridDim.x * blockDim.x;
        int gridStridey = gridDim.y * blockDim.y;
        int gridStridez = gridDim.z * blockDim.z;

        for (int i = threadx; i < x; i += gridStridex)
        {
            for (int j = thready; j < y; j += gridStridey)
            {
                for (int k = threadz; k < z; k += gridStridez)
                {
                    result[i] = a[i] + b[j] + c[k]; //unnecessary addition
                   
                    is_obstacle(foo_io); //call device function
                    
                    atomicAdd(foo, 1); //counts how many threads are executed
                    
                }
            }
        }   
    }

 __device__ bool is_obstacle()
    {
        printf("Checks if obstacle");
        return true;
    }

  void call_kernel_exe(double x, double y, double z,   unsigned long long int *cuda_count, float *d, 
                                    float *a, float *b, float *c,  dim3 blockDims, dim3 gridDims)
    {
        create_octomap_iter1<<<gridDims, blockDims>>>(x, y, z, cuda_count, d, a, b, c);
    }

int main()
{
        double x = 10, y = 8, z = 6;  //some random values
        float *a;
        float *b;
        float *c;
        float *d;
        
         //CUDA mem alloc
         int deviceId;
        cudaGetDevice(&deviceId);

        size_t size = x * y * z * sizeof(double);
        cudaMallocManaged(&a, size);
        cudaMallocManaged(&b, size);
        cudaMallocManaged(&c, size);
        cudaMallocManaged(&d, size);

        cudaMemPrefetchAsync(a, size, deviceId);
        cudaMemPrefetchAsync(b, size, deviceId);
        cudaMemPrefetchAsync(c, size, deviceId);
        cudaMemPrefetchAsync(d, size, deviceId);

        //kernelkonfig
        dim3 blockDims(8, 8, 8);
        int numBlocksX = (res_x + blockDims.x - 1) / blockDims.x;
        int numBlocksY = (res_y + blockDims.y - 1) / blockDims.y;
        int numBlocksZ = (res_z + blockDims.z - 1) / blockDims.z;
        dim3 gridDims(numBlocksX, numBlocksY, numBlocksZ);

        //counting threads
        unsigned long long int count = 0, *cuda_count;
        cudaMalloc((void **)&cuda_count, sizeof(unsigned long long int));
        cudaMemcpy(cuda_count, &count, sizeof(unsigned long long int), cudaMemcpyHostToDevice);

        call_create_octomap_iter(x, y, z, cuda_count, d, a, b, c, blockDims, gridDims);
        cudaGetLastError();
        cudaDeviceSynchronize();
        cudaMemcpy(&count, cuda_count, sizeof(unsigned long long int), cudaMemcpyDeviceToHost);
        cudaFree(cuda_count); 
        std::cout << "count threads: " << count << '\n';
}

The code compiles if i apply my “fix” mentioned on my first post.