Nvcc on Linux tries to resolve ::lerp as std::lerp with compute 80 or higher

...(93): error: more than one instance of overloaded function "lerp" matches the argument list:
            function "std::lerp(float, float, float) noexcept" (declared at line 1911 of /usr/include/c++/11/cmath)
            function "lerp(float, float, float)" (declared at line 1226 of .../helper_math.h)
            argument types are: (const float, const float, const float)
   auto x0 = ::lerp(a, b, relPos.x);

The above error happens on Linux when compiling with compute architecture 8.0 but not 6.1. It also doesn’t happen on Windows.

On Linux, removing the lerp(float,float,float) definition from helper_math.h solves this. On Windows it causes compilation to fail.

The Linux behaviour seems like an error, since ::lerp shouldn’t be resolved as std::lerp.

I tried it with SDKs 12.4 and 12.8.

Anyone encountered this? Any idea how to bypass it except for different code for Windows and Linux?

1 Like

helper_math.h is not part of the CUDA toolkit.

It is part of CUDA sample codes which are not intended to be used for production code.

So don’t include or use helper_math.h

Then fix your code.

I understand, though this looks to me like a problem on the compiler side, so ā€œfix your codeā€ feels both condescending and wrong. I think it would be better if NVIDIA fixed is own code.

(But thanks for the quick reply.)

1 Like

sorry to have been condescending and wrong

it looks to me like you have 2 candidates on the linux side (because one is coming from linux system headers, which I don’t necessarily expect to match windows system headers.)

what is the problem on the compiler side? Are you referring to this:

Can you provide a short, complete example of the issue?

I tried this on CUDA 12.8.1, but that did not show an issue:

$ cat test.cu
#include <./cuda-samples/Common/helper_math.h>

__device__ float f(float a, float b, float c){


  return ::lerp(a,b,c);
}
$ nvcc -arch=sm_80 -I.  -dc test.cu
$
inline __device__ __host__ float lerp(float a, float b, float t)
{
	return a + t * (b - a);
}

__global__ void interpolate(const float* a, const float* b, const float* t, float* result)
{
	int i = blockIdx.x * blockDim.x + threadIdx.x;

	result[i] = ::lerp(a[i], b[i], t[i]);
}

This shows the problem.

It’s compiled with CMake with:

set(CMAKE_CUDA_ARCHITECTURES "61;80")
add_compile_options($<$<COMPILE_LANGUAGE:CUDA>:--std=c++20>)
add_compile_options($<$<COMPILE_LANGUAGE:CUDA>:--extended-lambda>)
add_compile_options($<$<COMPILE_LANGUAGE:CUDA>:--expt-relaxed-constexpr>)

I don’t think the flags matter. They’re just there because I used what I normally use. Compiles fine with arch 61 but not with arch 80.

Edit: I initially posted with ā€˜lerp’ instead of ā€˜::lerp’, but got the same problem with both.

1 Like

I didn’t have any trouble compiling that on CUDA 12.8.1:

$ cat test2.cu
inline __device__ __host__ float lerp(float a, float b, float t)
{
        return a + t * (b - a);
}

__global__ void interpolate(const float* a, const float* b, const float* t, float* result)
{
        int i = blockIdx.x * blockDim.x + threadIdx.x;

        result[i] = lerp(a[i], b[i], t[i]);
}
$ nvcc -arch=sm_80 -I.  -dc test2.cu
$

Can you shown the full cmake verbose compilation output?

Thanks. I’ll try to debug the compilation to see when the problem surfaces.

1 Like

ahh

The -std=c++20 flag shows the issue:

$ nvcc -arch=sm_80 -I.  -dc test2.cu -std=c++20
test2.cu(10): error: more than one instance of overloaded function "lerp" matches the argument list:
            function "std::lerp(float, float, float) noexcept" (declared at line 1911 of /usr/include/c++/11/cmath)
            function "lerp(float, float, float)" (declared at line 1)
            argument types are: (const float, const float, const float)
   result[i] = lerp(a[i], b[i], t[i]);
               ^

1 error detected in the compilation of "test2.cu".
$

Thanks. Good to know you see it.

I have seen other situations where the device code compiler ā€œresolvesā€ undecorated functions using the standard library (perhaps, unexpectedly). So it’s not obvious to me that is not allowed. (I don’t know.)

If you think it is a (nvcc) bug, probably best to file a bug.

If you are looking for workarounds,

it seems like one possibility would be to include your own definition in a namespace, then call out that namespace specifically:

$ cat test2.cu
namespace foo{
inline __device__ __host__ float lerp(float a, float b, float t)
{
        return a + t * (b - a);
}
}

__global__ void interpolate(const float* a, const float* b, const float* t, float* result)
{
        int i = blockIdx.x * blockDim.x + threadIdx.x;

        result[i] = foo::lerp(a[i], b[i], t[i]);
}
$ nvcc -arch=sm_80 -I.  -dc test2.cu -std=c++20
$

And, FWIW, I see the same failure using the non-namespace code, whether I compile for -arch=sm_61 or -arch=sm_80. That error doesn’t appear to be arch-dependent.

Okay, thanks. It surfaced when I first used arch 80, but it might have had to do with other defaults of the compiler. IIRC I was using C++17 at the time, so perhaps it matters there.

Anyway, I opened a bug, so I’ll wait to see what happens there and in the mean time hack this thing. Thanks for the help.

1 Like

Thanks for filing a ticket (ID 5238032) . It is under review and we will bring back conclusion here .