Clarification on usage of nvstd::function

Hi,

I’m interested in using nvstd::function in a project, but the way I intend to use it seems to conflict with the guidelines described in the official CUDA C Programming guide, so I’m hoping to find clarification if what I’m doing is admissible or if it’s undefined behavior.

Concretely, my use-case looks like:

#include <nvfunctional>                                                          
#include <cinttypes>                                                             
#include <iostream>                                                              

// my library has a handful of __device__ functions
// and I'd like to be able to pass these functions around from kernel to kernel
__device__ double foo(double x) { return x * x; }                                

// This kernel accepts a __device__ fn pointer, and uses it to create an
// nvstd::function of the appropriate signature                                                        
template < auto f >                                                              
__global__ void create_func(nvstd::function< double(double) > * func) {          
    *func = f;                                                                   
}                                                                                

// This kernel evaluates the provided nvstd::function and saves the output                                                                                  
__global__ void apply_func(nvstd::function< double(double) > * func, double * output) {
    *output = (*func)(3.0);                                                      
}                                                                                
                                                                                 
int main() {                                                                     
    nvstd::function<double(double)> * func;                                      
    cudaMallocManaged(&func, sizeof(nvstd::function<double(double)>));           
                                                                                 
    double * output;                                                             
    cudaMallocManaged(&output, sizeof(double));                                  
                                                    
    // create the nvstd::function object                             
    create_func<foo><<<1,1>>>(func);                                             
    cudaDeviceSynchronize();          

    // and later invoke it in a separate kernel                            
    apply_func<<<1,1>>>(func, output);                                           
    cudaDeviceSynchronize();                                                     
                                                                                 
    std::cout << *output << std::endl;                                           
}                                                                                

This example seems to compile and run with nvcc (12.2) and clang++ (16), it produces the expected output, and compute-sanitizer reports no issues. However, in this section of the CUDA C Programming guide, it mentions:

nvstd::function cannot be used in the parameter type of a __global__ function, if the __global__ function is launched from host code.

Does this mean that my code is not expected to work, or is it the case that only passing nvstd::functions by-value to __global__ functions is prohibited?

I understand that runtime polymorphism and cuda kernels is a potential footgun, since copying objects between host and device doesn’t necessarily move the vtable, so virtual functions can segfault. But, I was also under the impression that if the object was created in a cuda kernel, then its vtable would be in device memory, so it could be used in subsequent cuda kernels.

Thank you,

Sam

I have my opinions.

If someone requires a bulletproof guarantee, these forums aren’t the right place for that. Such a request should be filed as a bug, requesting a doc update for clarification.

Bring ticket conclusion here .

[Public]

Thanks to our compiler engineering team . Here is what we conclude .
Using nvstd function in your way is risky because there is no guarantee that it will remain ABI-compatible between host and device. A much simpler way would be to allocate a function pointer instead and do the same trick of launching one kernel to initialize the function pointer and use it in subsequent launches.

Hope it helps .

Best,
Yuki

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.