Calling host function from global function

VictoryCoorg · August 3, 2020, 7:51pm

While building someone else’s PyTorch cpp/CUDA extension (SparseConvNet) on Windows, I am getting errors of the following type. My guess is that quite a number of people have built this library on Linux and probably aren’t getting these errors. Any idea why these errors may be getting reported only on Windows?
Thanks.

folderPath\SparseConvNet\sparseconvnet\SCN\CUDA/BatchNormalization.cu(49): error: calling a __host__ function("pow<float, double, (int)0> ") from a __global__ function("BatchNormalization_f_train<float, (int)16, (int)64> ") is not allowed

folderPath\SparseConvNet\sparseconvnet\SCN\CUDA/BatchNormalization.cu(49): error: identifier "pow<float, double, (int)0> " is undefined in device code

folderPath\SparseConvNet\sparseconvnet\SCN\CUDA/BatchNormalization.cu(83): error: calling a __host__ function("pow<float, double, (int)0> ") from a __global__ function("BatchNormalization_f_test<float, (int)16, (int)64> ") is not allowed

folderPath\SparseConvNet\sparseconvnet\SCN\CUDA/BatchNormalization.cu(83): error: identifier "pow<float, double, (int)0> " is undefined in device code

folderPath\SparseConvNet\sparseconvnet\SCN\CUDA/BatchNormalization.cu(49): error: calling a __host__ function("pow<float, double, (int)0> ") from a __global__ function("BatchNormalization_f_train<float, (int)12, (int)64> ") is not allowed

VictoryCoorg · August 3, 2020, 10:14pm

Interestingly, “pow” happens to be listed among the available “Standard Functions” in the documentation at Programming Guide :: CUDA Toolkit Documentation. The comment about the “Standard Functions” there says “The functions from this section can be used in both host and device code”. So, why is nvcc taking “pow” to be host only function?
Just as an experiment, I replaced “pow” by “powf” and the compiler did not generate errors.

Any insights why this could be?
Thanks.

njuffa · August 3, 2020, 10:45pm

If I interpret the error message correctly, this is not a function signature supported by CUDA (or standard C++, best I know). Available choices are:

pow (double, int)
pow (double, double)
pow (float, int)
pow (float, float)

If a function call in device code doesn’t match any of the available signatures and cannot be resolved to any particular existing signature (presumably because of ambiguity), the compiler may assume it refers to a host function. I am not entirely sure under which circumstances that happens.

The first line in question seems to be:

_saveInvStd = pow(_saveInvStd / nActive + eps, -0.5);

where _saveInvStd is of type T, nActive is of type int and 0.5 is of type double. So I am assuming you get this error message when you instantiate this with T=float, as that results in pow (float, double).

Here is what happens when I extract this into a simple standalone demo kernel:

#include <cstdio>
#include <cstdlib>

__global__ void kernel (float f, double d, int i)
{
    double r1, r2, r3, r4, r5, r6, r7, r8, r9;
    r1 = pow (f, i);
    r2 = pow (f, f);
    r3 = pow (d, i);
    r4 = pow (d, d);
    r5 = powf (f, i);
    r6 = powf (f, f);
    r7 = powf (d, i);
    r8 = powf (d, d);
    r9 = pow (f, d);
    printf ("%23.16e %23.16e %23.16e %23.16e %23.16e %23.16e %23.16e %23.16e\n",
            r1, r2, r3, r4, r5, r6, r7, r8);
}

int main (void)
{
    kernel<<<1,1>>>(2,3,4);
    cudaDeviceSynchronize();
    return EXIT_SUCCESS;
}

The CUDA 9.2 toolchain on Windows complains:

pow.cu(15): error: more than one instance of overloaded function "pow" matches the argument list:
            function "pow(double, double)"
            function "pow(float, float)"
            function "pow(float, int)"
            argument types are: (float, double)

VictoryCoorg · August 3, 2020, 11:02pm

Interesting… The actual call is pow(_saveInvStd / nActive + eps, -0.5), which takes just two arguments. Perhaps, the compiler is seeing some other templated declaration of “pow” somewhere else that takes three parameters and is confusing it to be that instead of as calling the standard function. Wondering whether there is someway to indicate that what we want is the standard function “pow”… A namespace for anything like that…

njuffa · August 3, 2020, 11:10pm

When instantiated with T = float, computing pow(_saveInvStd / nActive + eps, -0.5) turns into pow (float/int + float, double) ==> pow (float, double). This is not supported by the CUDA standard math library.

I haven’t checked whether some host compilers have a prototype for and therefore accept pow (float, double). If so, it would seem to be an extension over the standard; at least I haven’t found anything in the C++ standard requiring such an overload.

I have only very rudimentary knowledge of Python, but as far as I know it does not support a single-precision floating-point type, only double precision. Which probably means that in this context the code should never be instantiated with T = float.

Topic		Replies	Views
pow function not compiling in CUDA kernel CUDA Programming and Performance	2	12822	September 21, 2009
Compiling double precission I can't use pow function... CUDA Programming and Performance	3	3579	June 14, 2009
Can't compile fast math functions CUDA Programming and Performance	2	2193	May 29, 2010
Sqrt and Pow on CUDA CUDA Programming and Performance	2	28360	July 8, 2010
Nvcc compiler error - error: calling a __host__ function from a __global__ function is not allowed CUDA Programming and Performance	1	522	July 7, 2022
Error: Calling a host function from a global function CUDA Programming and Performance	2	7329	October 16, 2011
__host__ and __device__ qualifies CUDA Programming and Performance	1	4461	February 13, 2010
calling a host function from a __device__ error CUDA Programming and Performance	1	2964	August 31, 2007
Tiny but painful error with host/device function __device__ cannot be called from __host__ __device_ CUDA Programming and Performance	7	11234	September 8, 2010
error: calling a __host__ function("printf") from a __global__ function is not allowed CUDA Programming and Performance	0	1435	March 23, 2015

Calling __host__ function from __global__ function

Related topics

Calling host function from global function