C++ 17 variadic template folding expressions in device methods

bkerkhof · August 1, 2022, 4:26pm

According to the developer guide, C++17 should be supported (insofar it is supported by the host compiler), however it is not clear what the extent of support is in device code and if folding expressions for variadic template is supported.

Consider the following simple example as written directly for the host compiler (in this case: MSVC 16.11.3 - Visual Studio 2019):

#include <limits>
#include <type_traits>

#include <cuda_runtime.h>

template <size_t NumberOfValues>
struct FooHost
{
	template <size_t Index>
	static constexpr void AccumulateSum(float* accumulator, const float* const values)
	{
		*accumulator += values[Index];
	}

	template <size_t ... Indices>
	static constexpr void SumImpl(float* accumulator, const float* const values, std::index_sequence<Indices...>)
	{
		(..., AccumulateSum<Indices>(accumulator, values));
	}

	static constexpr float Sum(const float* const values)
	{
		float accumulator = 0.f;
		SumImpl(&accumulator, values, std::make_index_sequence<NumberOfValues>());
		return accumulator;
	}
};

void FooTestHost()
{
	constexpr float values[] = { 1.0f, 2.0f, 3.0f };
	constexpr float sum = FooHost<3>::Sum(values);
}

This compiles without problems, and has the expected outcome of sum being 6.0f, and known at compile time.

However, once nvcc gets involved, it chokes on the folding expression with the error “expected an expression” (for both host and device code). When trying to compile the following with nvcc:

#include <limits>
#include <type_traits>

#include <cuda_runtime.h>

template <size_t NumberOfValues>
struct FooDevice
{
	template <size_t Index>
	static constexpr __device__ void AccumulateSum(float* accumulator, const float* const values)
	{
		*accumulator += values[Index];
	}

	template <size_t ... Indices>
	static constexpr __device__ void SumImpl(float* accumulator, const float* const values, std::index_sequence<Indices...>)
	{
		(..., AccumulateSum<Indices>(accumulator, values));
	}

	static constexpr __device__ float Sum(const float* const values)
	{
		float accumulator = 0.f;
		SumImpl(&accumulator, values, std::make_index_sequence<NumberOfValues>());
		return accumulator;
	}
};

__global__ void kernelFoo()
{
	constexpr float values[] = { 1.0f, 2.0f, 3.0f };
	constexpr float sum = FooDevice<3>::Sum(values);
}

Then the compilation fails with

error : expected an expression
detected during instantiation of "float FooDevice<NumberOfValues>::Sum(const float *) [with NumberOfValues=3ULL]"

Switching from a left-folding to a right-folding expression doesn’t change much, other than introducing an extra error:

parameter pack "Indices" was referenced but not expanded
detected during instantiation of "float FooDevice<NumberOfValues>::Sum(const float *) [with NumberOfValues=3ULL]"

So: does NVCC support C++17 folding expressions, and if so: what needs to be done to get it working?

I’m currently using VS2019 16.11.3 and CUDA Toolkit 11.5.

bkerkhof · August 1, 2022, 5:09pm

Perhaps I should already pre-empt helpful answers on how to make it work without folding expressions: I already know that recursion based on if constexpr(...) works. It is just that the code (in my particular case) would be a lot cleaner if I could use folding instead ;-).

Hence the question specifically on folding…

For those who wonder: the following compiles without issue, and does what you would expect:

template <size_t NumberOfValues>
struct Foo
{
	template <size_t FirstIndex, size_t ... NextIndices>
	static constexpr __device__ void AccumulateSum(float* accumulator, const float* const values)
	{
		*accumulator += values[FirstIndex];
		if constexpr (sizeof...(NextIndices) > 0)
		{
			AccumulateSum<NextIndices...>(accumulator, values);
		}
	}

	template <size_t ... Indices>
	static constexpr __device__ void SumImpl(float* accumulator, const float* const values, std::index_sequence<Indices...>)
	{
		AccumulateSum<Indices...>(accumulator, values);
	}

	static constexpr __device__ float Sum(const float* const values)
	{
		float accumulator = 0.f;
		SumImpl(&accumulator, values, std::make_index_sequence<NumberOfValues>());
		return accumulator;
	}
};

Robert_Crovella · August 1, 2022, 5:09pm

If I add #include <utility> I don’t have any trouble compiling your second example on CUDA 11.4, gnu/g++ 7.3.1, CentOS 7, as long as I include -std=c++17 on the compile command line.

A guess as to the problem here is not identifying to the CUDA compilation sequence that c++17 support is needed.

bkerkhof · August 1, 2022, 5:13pm

@Robert_Crovella: that’s definitely progress :-).

That begs the question on how I get MSBuild and/or Visual Studio to actually pass that option to NVCC. There seems to be no way to actually do that in the UI – or at least none that I have found…

Robert_Crovella · August 1, 2022, 5:13pm

can you try this

bkerkhof · August 1, 2022, 5:30pm

That indeed seems to be the option hidden at the bottom of the proverbial locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard’ :-D.

So now it finally compiles. Many thanks!

To be fair: it would be nice if the Visual Studio integration for CUDA just took over the language setting that we already have in the project files, or at the very least offer a clear option somewhere in the properties. I might have found it then…

Robert_Crovella · August 1, 2022, 5:34pm

You might wish to file a bug requesting whatever behavior you think makes sense.

system · August 15, 2022, 5:35pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.