Constexpr Partial Function Specialization

ArktisDev · August 6, 2022, 5:17am

I am trying to partially specialize some functions with some number of float arguments to match the type float (*)(float). These functions represent probability density functions and my main calculation kernel takes a variadic number of these function pointers for the calculation. The specific probability density functions I am using have 2 extra float args though which parameterize the distribution. These 2 floats are known at compile time, so I find it not unreasonable to think that it should be possible to construct a constexpr function pointer to a density function with 2 of 3 parameters specialized already.

For example in the following code (which compiles with g++ 12.1 once I remove the __host__ and __device__) I am trying to specialize the arguments a and c from R2WoodsSaxon and get back a constexpr function pointer which can be passed to further templates.

#include <iostream>
#include <cmath>

#include <array>
#include <utility>


typedef float (*PDF)(float);

// r^2 * Woods Saxon Distribution
__host__ __device__
float R2WoodsSaxon(float r, float a, float c)
{
    if (r < 0.0f) return 0.0f;
    return r * r / (1 + expf( (r - c) / a ));
}

// r^2 * Gaussian Distribution
__host__ __device__
float R2Gaussian(float r, float variance) {
    if (r < 0.0f) return 0.0f;
    return r * r * expf(- r * r / (2 * variance));
}

template <const auto pdfFunction, const auto& arr, typename = void>
struct FunctionSpecializerHelper;

template <const auto pdfFunction, const auto& arr, std::size_t... i>
struct FunctionSpecializerHelper<pdfFunction, arr, std::index_sequence< i... > > {
    constexpr static PDF type = [] __host__ __device__ (float r) -> float {return pdfFunction(r, arr.at( i )...); };
};

template <const auto pdfFunction, const auto& arr>
struct FunctionSpecializer {
    constexpr static PDF type = FunctionSpecializerHelper<pdfFunction, arr, std::make_index_sequence<arr.size()> >::type;
};

int main() {
    constexpr static std::array<float, 2> arr = {2.f, 0.54f};
    constexpr static auto func = FunctionSpecializer<R2WoodsSaxon, arr>::type;
    
    std::cout << func(1) << std::endl;
    
    return 0;
}

I’m not sure why this code compiles with g++ but fails to compile using nvcc. But this is also only one issue! Using a workaround to try and generate similar results with a static member function of a template struct shows that there are further issues.

#include <iostream>
#include <cmath>

#include <array>
#include <utility>


typedef float (*PDF)(float);

// r^2 * Woods Saxon Distribution
__host__ __device__
float R2WoodsSaxon(float r, float a, float c)
{
    if (r < 0.0f) return 0.0f;
    return r * r / (1 + expf( (r - c) / a ));
}

template<const auto pdfFunction, const auto& arr>
struct Functor
{
    __device__ __host__
    constexpr Functor() {}

    __device__ __host__
    static float Call(float r) { return pdfFunction(r, arr.at(0), arr.at(1) ); }
};

int main() {
    constexpr static std::array<float, 2> arr = {2.f, 0.54f};
    constexpr static auto func = Functor<R2WoodsSaxon, arr>().Call;
    
    std::cout << func(1) << std::endl;
    
    return 0;
}

This compiles fine because the generated function pointer is only being used on the host, but as soon as that function pointer is passed to a device function the compilation fails and I receive some very cryptic error messages. The below attached image is from a different version of the code, but the errors are the same.

For whatever reason the compiler is not able to pull the values from the std::array and put them into the function pointer and only fails compiling after already mangling all the function names.

Is this sort of operation just not possible? I can partially specialize the function myself by hardcoding a new function which just calls the more general distribution function but with hardcoded values, so if it is absolutely impossible to do what I am trying to do I could write some script to automatically generate me code for a given set of desired probability densities. Obviously that is somewhat less than optimal so I would like to be able to programmatically generate these function pointers.

Any tips?

Robert_Crovella · August 6, 2022, 9:40am

I’m not sure either. You may wish to file a bug. It’s possible one of the restrictions listed in the programming guide may apply, but I’m not able to identify which. Or someone else may spot something. From a forum perspective, it would be nice if you posted the nvcc compile command line (not from cmake, please) as well as the exact compiler output.

I suggest posting the failing test case. Don’t post a passing test case and then expect me to guess at how to convert it to a failing test case. Just a suggestion. Do as you wish, of course.
A function pointer captured in host code is not going to be usable in device code and vice versa, generally speaking. A function pointer does not retain something like __host__ __device__ decoration. So apart from compilation issues, I’m not optimistic that such a design mentality is going to work. This is covered in a variety of forum posts, and one of the NVIDIA sample codes demonstrates methods to use function pointers.

As an aside, please don’t post pictures of text. Post the text instead.

ArktisDev · August 6, 2022, 9:41pm

Unfortunately my ubuntu drive failed after I took the screenshot of the error but before I made this post so I wasn’t able to copy the text from the image.

#include <iostream>
#include <cmath>

#include <array>
#include <utility>


typedef float (*PDF)(float);

// r^2 * Woods Saxon Distribution
__host__ __device__
float R2WoodsSaxon(float r, float a, float c)
{
    if (r < 0.0f) return 0.0f;
    return r * r / (1 + expf( (r - c) / a ));
}

template<const auto pdfFunction, const auto& arr>
struct Functor
{
    __device__ __host__
    constexpr Functor() {}

    __device__ __host__
    static float Call(float r) { return pdfFunction(r, arr.at(0), arr.at(1) ); }
};

template<const PDF pdf>
__global__
void myKernel(float* f) {
    *f = pdf(1);
}

int main() {
    constexpr static std::array<float, 2> arr = {2.f, 0.54f};
    constexpr static auto func = Functor<R2WoodsSaxon, arr>().Call;

    float *d_f;

    cudaMalloc((void **) &d_f, 1);
    
    myKernel<func><<<1,1>>>(d_f);
    
    return 0;
}

This MWE was failing to compile on my computer giving the cryptic error message I shared in my first post. Godbolt says it should compile (Compiler Explorer) but on my computer it does not for whatever reason.

I tried it out and it did seem to actually work to pass constexpr function pointers as template arguments. As long as I defined a function matching the expected function pointer type I defined in the global kernel function the compiler would use the device version of that function. The only time this has failed is when I am trying to do the partial argument binding like I shared in this post.

Robert_Crovella · August 6, 2022, 11:29pm

constexpr functions do have the ability to be called cross-space, however I don’t know if that applies if you wrap them in a function pointer. It may still work.

I suggest filing one or two bugs.

Topic		Replies	Views
Device function pointers: Is it possible to use them in a useful way? CUDA Programming and Performance	16	9199	May 20, 2020
function pointers CUDA Programming and Performance	8	10791	June 15, 2007
How can I use __device__ function pointer in CUDA ? CUDA Programming and Performance	34	60710	June 3, 2020
Passing a function ptr CUDA Programming and Performance cuda	2	598	April 14, 2021
Array of function pointers assignment CUDA Programming and Performance	7	1082	March 23, 2022
Warp Invalid PC, device function pointer CUDA Programming and Performance	4	1114	May 29, 2019
How to pass a NVRTC-compiled device function pointer to an already compiled device function? CUDA Setup and Installation cuda	0	466	June 15, 2022
Is this correct way to code function pointers? CUDA Programming and Performance	4	2531	March 12, 2009
How to copy a host function pointer to device in CUDA CUDA Programming and Performance cuda	6	774	July 10, 2024
Function pointers crashing kernel calls CUDA Programming and Performance	1	2920	August 8, 2011

Constexpr Partial Function Specialization

Related topics