Strange behavior of Cuda compiler when using template loops

Hi,

I am using template based loop unrolling (https://stackoverflow.com/a/45819050) in my cuda files and nvcc interferes with that.
I threw toghether a minimal example of the problem im having here.

#include <type_traits>
#include <stdio.h>

template<int n>
void testPrintN()
{
  printf("v:%d\n", n);
}

template <int First, int Last>
struct static_for
{
  template <typename Lambda>
  static inline constexpr void apply(Lambda const& f)
  {
    if (First < Last)
    {
      f(std::integral_constant<int, First>{});
      static_for<First + 1, Last>::apply(f);
    }
  }
};
template <int N>
struct static_for<N, N>
{
  template <typename Lambda>
  static inline constexpr void apply(Lambda const& f) {}
};

int main()
{
  static_for<0,2>::apply([](auto i) {
    static_for<0,2>::apply([&](auto j) {
      testPrintN<i.value + j.value>();
    });
  });
}

As you can see this code does not have any Cuda syntax in it.
In fact “g++ -std=c++14 minimal.cpp” compiles it just fine and it does what it should, yet “nvcc -std=c++14 minimal.cu” throws an error.

minimal.cu: In lambda function:
minimal.cu:34:27: error: ‘N’ was not declared in this scope
       testPrintN<i.value + j.value>();
                           ^
minimal.cu:34:1: error: parse error in template argument list
       testPrintN<i.value + j.value>();
 ^

Note that this error also does not occure when I pass nvcc the same code in a .cpp-file, but I want to use these templates in the cuda kernels as well as in the host code.

I’m using Ubuntu 16.04 LTS, g++ 5.5.0 and Cuda 9.0
Also this does not happen on Windows using the Visual Sudio Compiler

Thanks in advance,
Jack

Your code compiles cleanly for me on CUDA 9.2 on Fedora 27

I suggest you try CUDA 9.2 or CUDA 10.0

$ cat t15.cu
#include <type_traits>
#include <stdio.h>

template<int n>
void testPrintN()
{
  printf("v:%d\n", n);
}

template <int First, int Last>
struct static_for
{
  template <typename Lambda>
  static inline constexpr void apply(Lambda const& f)
  {
    if (First < Last)
    {
      f(std::integral_constant<int, First>{});
      static_for<First + 1, Last>::apply(f);
    }
  }
};
template <int N>
struct static_for<N, N>
{
  template <typename Lambda>
  static inline constexpr void apply(Lambda const& f) {}
};

int main()
{
  static_for<0,2>::apply([](auto i) {
    static_for<0,2>::apply([&](auto j) {
      testPrintN<i.value + j.value>();
    });
  });
}
$ nvcc -std=c++14 t15.cu
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148
$ g++ --version
g++ (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$