Does this count as instantiating the template function?

liyaoaeta · January 6, 2021, 5:40am

TL;DR: Does the line “g_foo<1>;” count as instantiating the global template function g_foo<1>? (If not, is there a way to lazily instantiate but not execute template functions?)

Context/details:

Suppose we have a host/device template function and we wish to specialize it based on where it was called. Our code might use the CUDA_ARCH macro like this:

#include <iostream>

template<int X>
__global__ void g_foo();

template<int X>
__device__ __host__ void foo(){
    #ifndef __CUDA_ARCH__
        // Process locally if small; otherwise do in GPU
        g_foo<X><<<1, 1>>>();
    #else
        g_foo<X>; // !!!
        printf("%d\n", X);
    #endif
}

template<int X>
__global__ void g_foo(){
    foo<X>();
}

int main(){
    foo<1>();
}

The CUDA programming guide states: “If a global function template is instantiated and launched from the host, then the function template must be instantiated with the same template arguments irrespective of whether CUDA_ARCH is defined and regardless of the value of CUDA_ARCH.”

Thus, if the host version of foo calls g_foo, the device version of foo must instantiate g_foo.

In practice, it seems that merely writing “g_foo;” instantiates it (and as far as I can tell, does not seem to result in any executed code). Is this behaviour defined or is it just a hack? If this is just a hack, is there a proper way to do it or is this pattern just unviable?

Topic		Replies	Views
How to run templatized global function cuda templates CUDA Programming and Performance	6	29126	November 30, 2009
CUDA_ARCH and templates CUDA NVCC Compiler	0	669	August 8, 2023
forcing template compilation in CUDA C CUDA Programming and Performance	9	21611	April 5, 2011
Strange bug with __CUDA_ARCH__ and kernel template implicit instantiation CUDA Developer Tools	0	609	June 18, 2021
Access to CUDA library functions inside specialized instantiations of __device__ function templates CUDA Programming and Performance	3	1390	April 9, 2013
InvalidDeviceFunction error when launching templated global function CUDA Programming and Performance cuda , kernel , nvbugs	1	501	November 19, 2022
Function template specialization in device code CUDA Programming and Performance	1	1245	November 20, 2013
Using __CUDA_ARCH__ to do __device__ only assertion CUDA Programming and Performance	2	755	January 16, 2022
(Template kernel) VERSUS (Driver API) CUDA Programming and Performance	1	3669	May 21, 2009
Bug: `__device__` calls from `__host__` functions not detected with templates CUDA NVCC Compiler	0	354	September 4, 2023

Does this count as instantiating the template function?

Related topics