Is it possible to pass stateful functions to kernel?

hei.go1 · October 14, 2024, 9:42am

Hi everyone!

I’m writing library code and I want to let my users customize kernel behavior by passing in functions. I’ve worked out how to pass function pointers to kernels per Lei Mao’s blog post.

Now, I’m wondering if I can support stateful function objects, such as lambdas and functor structs with nonstatic data members. I have learned that nvstd::function cannot become a kernel parameter nor be passed from host to device according to the CUDA programming guide. What else can I do?

Do I have to resort to the void pointer to context pattern, like

using UserFunction = bool(*)(float foo, float bar, void* context);

__global__ kernel(UserFunction user_func, void * user_context) { 
  user_func(foo, bar, user_context);
}

striker159 · October 14, 2024, 11:00am

You could pass callable objects via template parameter.

#include <cstdio>

template<class OP>
__global__
void kernel(int a, OP op){
    int result = op(a);
    printf("result is %d\n", result);
}

struct MyOp{
    int i;
    __host__ __device__
    MyOp(int ii) : i(ii){}
    __device__
    int operator()(int a) const { return a + i; }
};

int main(){
    kernel<<<1,1>>>(10, MyOp(1));
    cudaDeviceSynchronize();
    kernel<<<1,1>>>(10, MyOp(10));
    cudaDeviceSynchronize();
}

hei.go1 · October 14, 2024, 11:54am

Thanks! And once the stateful user function has been passed into the kernel (made it onto the device) via a template parameter, could I forward it to a separately compiled device function, possibly via a nvstd::function parameter?

Robert_Crovella · October 14, 2024, 2:41pm

functors and lambdas can both carry “state” and be passed to kernels. Every C++ lambda by definition has a unique type signature (this is not unique or specific to CUDA). I think it should be possible to create a specific type signature of a functor and use that with or without templating, if desired, and/or with nvstd::function.

If you want to jump through hoops it may be possible to use nvstd::function in device code with lambdas. As already indicated, some utilization of templating will be necessary, probably.

1 2 3

hei.go1 · October 14, 2024, 9:06pm

I experimented a little and found what I suggested to be feasible, aka passing arbitrary functors into kernels through template parameters, but then forwarding the functor to another, separately compiled device function through a nvstd::function parameter. This is shown below.

// This could be separately compiled into a static library
__device__ float ClampedBinaryOp(nvstd::function<float(float, float)> func,
                                 float lhs, float rhs, float lb, float ub) {
  return min(max(func(lhs, rhs), lb), ub);
}

// This would live in a header
template <class OP>
__global__ void Kernel(OP op, size_t len, float const* lhs, float const* rhs,
                       float lb, float ub, float* result) {
  if (auto tid = blockDim.x * blockIdx.x + threadIdx.x; tid < len) {
    result[tid] = ClampedBinaryOp(op, lhs[tid], rhs[tid], lb, ub);
  }
}

This appears to satisfy my intention to do type erasure as soon as possible past a templated interface, just like how the renowned fmt library lets the generic fmt::format call down to the compiled fmt::vformat for heavy lifting. Please let me know if my approach is incorrect or inefficient.

Thank you @Robert_Crovella, your StackOverflow post answered several questions I have about nvstd::function, so can I conclude that:

nvstd::function or references to which cannot be used in the parameter type of a __global__ function,
Pointers to nvstd::function, allocated on device, can be used in the parameter type of a __global__ function,
(tricky part) The pointee nvstd::function must be assigned a target inside a kernel — no cudaMemcpy\w* function can do the same thing?

Topic		Replies	Views
Passing lambda functions as arguments to kernels CUDA Programming and Performance	3	7314	November 12, 2018
Subsitute for passing functions by address CUDA Programming and Performance	12	7382	August 26, 2009
Warp Invalid PC, device function pointer CUDA Programming and Performance	4	1096	May 29, 2019
Constructing std::function with a __global__ kernel CUDA Programming and Performance	2	2993	September 30, 2022
Clarification on usage of nvstd::function CUDA Programming and Performance	3	733	September 24, 2024
Using a Functor in CUDA Kernel CUDA Programming and Performance	1	260	September 6, 2024
Kernel functions as parameters? CUDA Programming and Performance	5	5340	February 11, 2011
Moving nvstd::function object (with __device___) to another class CUDA Programming and Performance cuda	3	1819	October 6, 2022
Device function pointers: Is it possible to use them in a useful way? CUDA Programming and Performance	16	9147	May 20, 2020
Pass function and parameters to another function to execute later in GPU using CUDA CUDA Programming and Performance	0	380	March 2, 2022

Is it possible to pass stateful functions to kernel?

Related topics