Clarification on usage of nvstd::function

spmish · February 14, 2024, 10:28pm

Hi,

I’m interested in using nvstd::function in a project, but the way I intend to use it seems to conflict with the guidelines described in the official CUDA C Programming guide, so I’m hoping to find clarification if what I’m doing is admissible or if it’s undefined behavior.

Concretely, my use-case looks like:

#include <nvfunctional>                                                          
#include <cinttypes>                                                             
#include <iostream>                                                              

// my library has a handful of __device__ functions
// and I'd like to be able to pass these functions around from kernel to kernel
__device__ double foo(double x) { return x * x; }                                

// This kernel accepts a __device__ fn pointer, and uses it to create an
// nvstd::function of the appropriate signature                                                        
template < auto f >                                                              
__global__ void create_func(nvstd::function< double(double) > * func) {          
    *func = f;                                                                   
}                                                                                

// This kernel evaluates the provided nvstd::function and saves the output                                                                                  
__global__ void apply_func(nvstd::function< double(double) > * func, double * output) {
    *output = (*func)(3.0);                                                      
}                                                                                
                                                                                 
int main() {                                                                     
    nvstd::function<double(double)> * func;                                      
    cudaMallocManaged(&func, sizeof(nvstd::function<double(double)>));           
                                                                                 
    double * output;                                                             
    cudaMallocManaged(&output, sizeof(double));                                  
                                                    
    // create the nvstd::function object                             
    create_func<foo><<<1,1>>>(func);                                             
    cudaDeviceSynchronize();          

    // and later invoke it in a separate kernel                            
    apply_func<<<1,1>>>(func, output);                                           
    cudaDeviceSynchronize();                                                     
                                                                                 
    std::cout << *output << std::endl;                                           
}

This example seems to compile and run with nvcc (12.2) and clang++ (16), it produces the expected output, and compute-sanitizer reports no issues. However, in this section of the CUDA C Programming guide, it mentions:

nvstd::function cannot be used in the parameter type of a __global__ function, if the __global__ function is launched from host code.

Does this mean that my code is not expected to work, or is it the case that only passing nvstd::functions by-value to __global__ functions is prohibited?

I understand that runtime polymorphism and cuda kernels is a potential footgun, since copying objects between host and device doesn’t necessarily move the vtable, so virtual functions can segfault. But, I was also under the impression that if the object was created in a cuda kernel, then its vtable would be in device memory, so it could be used in subsequent cuda kernels.

Thank you,

Sam

Robert_Crovella · February 14, 2024, 11:03pm

I have my opinions.

If someone requires a bulletproof guarantee, these forums aren’t the right place for that. Such a request should be filed as a bug, requesting a doc update for clarification.

Yuki_Ni · September 24, 2024, 3:45am

Bring ticket conclusion here .

[Public]

Thanks to our compiler engineering team . Here is what we conclude .
Using nvstd function in your way is risky because there is no guarantee that it will remain ABI-compatible between host and device. A much simpler way would be to allocate a function pointer instead and do the same trick of launching one kernel to initialize the function pointer and use it in subsequent launches.

Hope it helps .

Best,
Yuki

system · October 8, 2024, 3:45am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
InvalidDeviceFunction error when launching templated global function CUDA Programming and Performance cuda , kernel , nvbugs	1	456	November 19, 2022
Is it possible to pass stateful functions to kernel? CUDA Programming and Performance	4	105	October 14, 2024
Warp Invalid PC, device function pointer CUDA Programming and Performance	4	1008	May 29, 2019
Constructing std::function with a __global__ kernel CUDA Programming and Performance	2	2923	September 30, 2022
nvcc and googletest CUDA Programming and Performance	5	16376	July 7, 2011
Functions inside __global__ functions question CUDA Programming and Performance	5	7202	March 14, 2012
LLVM Error when compiling C++ STD parallel execution policies to GPU nvc, nvc++ and nvfortran	9	481	May 2, 2024
Consistency of functions pointer CUDA Programming and Performance	5	3067	June 21, 2013
Moving nvstd::function object (with __device___) to another class CUDA Programming and Performance cuda	3	1757	October 6, 2022
Passing lambda functions as arguments to kernels CUDA Programming and Performance	3	6866	November 12, 2018

Clarification on usage of nvstd::function

Related topics