Failing compilation with clang and std17 on Tegra

Hi,

I’m currently facing a problem with compiling a very simple piece of cuda code. The code is:

#include <cstdint>
#include "cuda.h"

__global__ void yuv444ToInterleaved(
    const uint8_t *yPlane, uint32_t yPitch, const uint8_t *uvPlane, uint32_t uvPitch, int width, int height,
    uint8_t *dst) {
  int imageX = blockIdx.x * blockDim.x + threadIdx.x;
  int imageY = blockIdx.y * blockDim.y + threadIdx.y;
  if (imageX >= width || imageY >= height) {
    return;
  }
  int y = imageY * yPitch + imageX;
  int uv = imageY * uvPitch + imageX * 2;
  int k = imageY * width * 3 + imageX * 3;
  dst[k] = yPlane[y];
  dst[k + 1] = uvPlane[uv];
  dst[k + 2] = uvPlane[uv + 1];
}

int YUVConvert(
    CUdeviceptr yPlane, int yPlanePitch, CUdeviceptr uvPlane, int uvPlanePitch, int width, int height,
    CUdeviceptr destination) {

  const dim3 block(32, 32);
  const dim3 grid((width + block.x - 1) / block.x, (height + block.y - 1) / block.y);

  yuv444ToInterleaved<<<grid, block>>>(
      (uint8_t *)yPlane, yPlanePitch, (uint8_t *)uvPlane, uvPlanePitch, width, height, (uint8_t *)destination);

  cudaStreamAttachMemAsync(NULL, &destination, 0, cudaMemAttachHost);
  cudaStreamSynchronize(NULL);

  return 0;
}

nothing really special as far as I can say.

Usually in our build pipeline we’re using cmake but for the sake of this example I tried to reproduce only this compilation with nvcc directly. With clang-10 and clang-11 as host compilers the compilation fails in a rather cryptic message

$ /usr/local/cuda-11.4/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Sun_Oct_23_22:16:07_PDT_2022
Cuda compilation tools, release 11.4, V11.4.315
Build cuda_11.4.r11.4/compiler.31964100_0

$ /usr/local/cuda-11.4/bin/nvcc -std=c++17 -ccbin clang-11 -c yuv-kernel.cu

/usr/bin/../lib/gcc/aarch64-linux-gnu/10/../../../../include/c++/10/ext/numeric_traits.h(70): error: qualified name is not allowed

/usr/bin/../lib/gcc/aarch64-linux-gnu/10/../../../../include/c++/10/ext/numeric_traits.h(72): error: expected a "("

/usr/bin/../lib/gcc/aarch64-linux-gnu/10/../../../../include/c++/10/ext/numeric_traits.h(72): error: expected a type specifier

/usr/bin/../lib/gcc/aarch64-linux-gnu/10/../../../../include/c++/10/ext/numeric_traits.h(72): error: expected a ")"

/usr/bin/../lib/gcc/aarch64-linux-gnu/10/../../../../include/c++/10/ext/numeric_traits.h(76): error: expected a "("

/usr/bin/../lib/gcc/aarch64-linux-gnu/10/../../../../include/c++/10/ext/numeric_traits.h(76): error: expected a type specifier

/usr/bin/../lib/gcc/aarch64-linux-gnu/10/../../../../include/c++/10/ext/numeric_traits.h(77): error: expected a ";"

/usr/bin/../lib/gcc/aarch64-linux-gnu/10/../../../../include/c++/10/ext/numeric_traits.h(78): error: expected a "("

/usr/bin/../lib/gcc/aarch64-linux-gnu/10/../../../../include/c++/10/ext/numeric_traits.h(78): error: expected a ";"

/usr/bin/../lib/gcc/aarch64-linux-gnu/10/../../../../include/c++/10/ext/numeric_traits.h(88): error: class template "__gnu_cxx::__numeric_traits_integer<_Value>" has no member "__is_signed"

10 errors detected in the compilation of "yuv-kernel.cu".

When compiling with gcc as host compiler the compilation works as expected

$ /usr/local/cuda-11.4/bin/nvcc -std=c++17 -ccbin gcc -c yuv-kernel.cu

Additionally reducing the c++ standard to 14 also yields successful compilation with clang-11 and clang-10

/usr/local/cuda-11.4/bin/nvcc -std=c++14 -ccbin clang-11 -c yuv-kernel.cu
/usr/local/cuda-11.4/bin/nvcc -std=c++14 -ccbin clang-10 -c yuv-kernel.cu

Are there any pointers as to why this happens or what steps I could take to solve this. I’m working on a Tegra Linux on a Jetson Orin NX.
JP version 5.1.1
CUDA version 11.4
gcc version 9.4.0
clang version 10.0.0 and 11.0.0

CUDA compilation is not a free mix & match approach to the tastes of the user. The toolchain for every CUDA version is designed to work with specific designated host compiler versions, which are documented. What does the documentation for CUDA 11.4 state about supported host compilers? If the clang versions you use are on the list of supported hist compilers, you can file a bug with NVIDIA, otherwise you are on your own.

The reason for the restrictions on host compilers with CUDA is the tight integration between host and device code that CUDA offers, which is a different approach than that used by OpenCL, for example. This requires additional parsing work in the code splitting stages of nvcc’s compilation flow, and “weird” errors in header files are a typical symptom of a host compiler mismatch.

The restriction to C++14 presumably eliminates portions of the code from header files, among them the code the parsing fails on. If that workaround works for you, you could proceed like this. However, the use of a host compiler that is not supported by NVIDIA for the a particular CUDA version carries some residual risk of less obvious “quiet” issues. We have had a few examples in this sub-forum over the years. I do not recall the details, probably something to do with data alignment or structure sizes.

[Later:]

I do not think that the problem is related to a general deficiency in C++17 support in CUDA 11.x, because to my recollection CUDA 11.x had full support for C++17 in host and device code, which includes CUDA 11.4 which shipped in the summer of 2021.

My general advice would be to ask on the Jetson Orin NX forum. I’m not an expert on clang usage. Based on what I see here, it looks to me like what you have should be supported. There are potentially a few question marks, though. My understanding is that regardless of clang vs. gcc on linux, the proper headers are the gnu/libstdc++ ones (which you appear to be using.) So one thing I would check is to make sure you are using a proper version of gcc, and there is only one - 10.2 You mention gcc9.x in your text but the header path appears to be gcc-10 based. Anyway, if it were me, I would make sure it was 10.2.

Since until recently, the typical/usual way to install a proper environment for Tegra was via a Jetpack installer or a similar method (e.g. NVIDIA SDK), if it were me, I would start with a fresh install from Jetpack to verify any issues and make sure you are using correct versions of tools.

Hi Robert,

thanks for the insights. You’re right with the mismatch in libstdc++ and g++. I was testing different versions before posting and these were leftovers of that. There is also no Issue with gcc10.2.

I’ll reinstall Jetpack and everything shortly and might be able to update here or will write into the Jetson specific forums as per your suggestion.

For now I settled for a workaround of compiling the cuda-host code with gcc and the remainder of the application with clang. While I don’t really like this solution it works for now.