Does this count as instantiating the template function?

TL;DR: Does the line “g_foo<1>;” count as instantiating the global template function g_foo<1>? (If not, is there a way to lazily instantiate but not execute template functions?)


Suppose we have a host/device template function and we wish to specialize it based on where it was called. Our code might use the CUDA_ARCH macro like this:

#include <iostream>

template<int X>
__global__ void g_foo();

template<int X>
__device__ __host__ void foo(){
    #ifndef __CUDA_ARCH__
        // Process locally if small; otherwise do in GPU
        g_foo<X><<<1, 1>>>();
        g_foo<X>; // !!!
        printf("%d\n", X);

template<int X>
__global__ void g_foo(){

int main(){

The CUDA programming guide states: “If a global function template is instantiated and launched from the host, then the function template must be instantiated with the same template arguments irrespective of whether CUDA_ARCH is defined and regardless of the value of CUDA_ARCH.”

Thus, if the host version of foo calls g_foo, the device version of foo must instantiate g_foo.

In practice, it seems that merely writing “g_foo;” instantiates it (and as far as I can tell, does not seem to result in any executed code). Is this behaviour defined or is it just a hack? If this is just a hack, is there a proper way to do it or is this pattern just unviable?