Why when I tried to use "cosf" function in CUDA, there ocurred errors?

huanyihu2000 · August 27, 2023, 6:32am

__global__ void TukeyWin(int N, float r, float* t, float* result)
{
	int i = threadIdx.x + blockDim.x * blockIdx.x;

	float per = 0.5 * r;

	int tl = floor(per * (N - 1)) + 1;

	int th = N - tl + 1;

	if (i < N)
	{
		t[i] = (float)i / (float)(N - 1);
	}
	__syncthreads();
	if (i < tl)
	{
		result[i] = (1 + cosf(pi / per * (t[i] - per))) / 2;
	}
	__syncthreads();
	if (i >= tl && i < th - 1)
	{
		result[i] = 1;
	}
	__syncthreads();
	if (i >= th - 1 && i < N)
	{
		result[i] = (1 + cosf(pi / per * (t[i] - 1 + per))) / 2;
	}
}

Errors like:

identifier "acos<int, (int)0> " is undefined in device code	
calling a __host__ function("double  ::acos<int, (int)0> (T1)") from a __global__ function("TukeyWin") is not allowed

I have called the library " cuda_runtime.h".

njuffa · August 27, 2023, 11:21am

I do not see a call to acos() in the posted code, yet the error message seems to complain about such a call. Is the code posted here identical to what you are actually compiling?

The standard math functions have a limited number of overloads. Overloads for float and double definitely exist. I am not aware of an overload double acos (int) being supported by CUDA and am not aware that such an overload is required by C++.

There is a possibility that such an overload may have been added in one of the more recent revisions of the ISO-C++ standard (check the standard document). If such an overload is required, and the particular revision of the C++ standard that requires it is claimed to be supported by CUDA, you may want to file a bug.

njuffa · August 27, 2023, 11:59am

Hm, CPP Reference says the int overloads were added in C++11:

Additional overloads (since C++11)
Defined in header `<cmath>`
template< class Integer >
double acos ( Integer num );

I am looking at the list of supported C++11 features in CUDA:

but they are organized by specific proposal to the standard committee, which is a bit inconvenient for regular folks. None of the proposals listed looks like it has anything to do with adding integer overloads for standard math functions, which likely means that this C+11 feature is not supported by CUDA.

Assuming the problem is reproduceable with CUDA 12.2, you could file a feature request with NVIDIA. Use the bug reporting form and make a note that this is an enhancement request.

huanyihu2000 · August 28, 2023, 1:33am

Many thanks. Currently I can’t figure this problem out, so i rewrite my code in C, which is a little bit slower but more stable with no errors. 0_0

njuffa · August 28, 2023, 2:16am

If the only problem is the passing of int arguments to trigonometric functions as indicated by the error message in the thread-starting post, I would suggest simply casting such function arguments to double. Then the existing overload for double will match.

huanyihu2000 · August 28, 2023, 2:30am

It’s really weird. I didn’t call any acos function in my code, and the error refered to
result[i] = (1 + cosf(pi / per * (t[i] - per))) / 2;. I guess cosf used acos in its definition, which i didn’t actually see:
_Check_return_ __inline float __CRTDECL cosf(_In_ float _X) { return (float)cos(_X); }
besides, i didn’t use “int” type in my code, and to make sure that parameters in cosf are float, i even forced parameters to become type float like:
result[i] = (1 + cosf(float(pi / per * (t[i] - per)))) / 2;
but it was still wrong.

huanyihu2000 · August 28, 2023, 2:47am

It’s really weird. I didn’t call any acos function in my code, and the error refered to
result[i] = (1 + cosf(pi / per * (t[i] - per))) / 2;. I guess cosf used acos in its definition, which i didn’t actually see:
_Check_return_ __inline float __CRTDECL cosf(_In_ float _X) { return (float)cos(_X); }
besides, i didn’t use “int” type in my code, and to make sure that parameters in cosf are float, i even forced parameters to become type float like:
result[i] = (1 + cosf(float(pi / per * (t[i] - per)))) / 2;
but it was still wrong.

njuffa · August 28, 2023, 3:00am

Can’t reproduce. I took the code from the original post and put it into a .cu file. I added:

#include <math.h>
#define pi 3.141592653f

at the start, and

int main (void)
{
    TukeyWin<<<1,1>>>(0,0,0,0);
    return 0;
}

at the end. Compiles without errors. Either your posted code does not reflect what you are compiling, or there is an issue with your tool chain. Do you have multiple versions of CUDA installed on your computer by any chance?

huanyihu2000 · August 28, 2023, 3:17am

I am sure that I don’t have another version of CUDA. So ridiculious, I placed this global function in another CUDA file and tried to compile it. It worked!

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <cuComplex.h>
#include <iostream>
#include <fstream>
#include <cstring>
#include <cmath>
#include <cufft.h>
#include <chrono>
#include <cublas_v2.h>

#define pi 3.14

__global__ void TukeyWin(int N, float r, float* t, float* result)
{
	int i = threadIdx.x + blockDim.x * blockIdx.x;

	float per = 0.5 * r;

	int tl = floor(per * (N - 1)) + 1;

	int th = N - tl + 1;

	if (i < N)
	{
		t[i] = (float)i / (float)(N - 1);
	}
	__syncthreads();
	if (i < tl)
	{
		result[i] = (1 + cosf(pi / per * (t[i] - per))) / 2;
	}
	__syncthreads();
	if (i >= tl && i < th - 1)
	{
		result[i] = 1;
	}
	__syncthreads();
	if (i >= th - 1 && i < N)
	{
		result[i] = (1 + cosf(pi / per * (t[i] - 1 + per))) / 2;
	}
}

int main()
{
	std::cout << "Hello world!" <<std::endl;
}

Magic?! Is it possible that there are some overloads that overload cosf in CUDA, but it actually impossible I think.

njuffa · August 28, 2023, 3:21am

There is no need to do this in a CUDA source file (.cu suffix):

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

and in fact I would strongly advise against adding such includes.

huanyihu2000 · August 28, 2023, 3:39am

Those includes are needed in my code, so i put them in. I need to temporarily ignore this problem, because i have to finish my rest project. Anyway, really thank you for your help. Are you Asian? if you are American, it’s almost 3’o clock there. Once again, really thank your for your selfless help.

huanyihu2000 · August 28, 2023, 4:02am

If you are available, I meet an acceleration problem. Hope you can give a hand:
I have 2 pointers, and want to relocate data of the first pointer in the second one. Like the first pointer:{ 1,2,3,4,5,6,7,8,9,10} , the second one:{0,0,0,0,0,0,0,0,0,0,0,0,0,0}. After relocating:
the second one becomes:{1,2,3,4,5,0,0,7,8,9,10,0,0}.
I tried to manipulate pointer to relocate the data, including using cublasScopy, cudaMemcpy, but it was too slow. And finally i chose to use global function:
i first write my code like：

__global__ void GetOverlapData(cuFloatComplex* Input, cuFloatComplex* Output, float* tukey, int Unit, int Interval, int num)
{
	int i = threadIdx.x + blockDim.x * blockIdx.x;

	for (int j = 0; j < num; j++)
	{
		if (i < Unit)
		{
			Output[i + j * Interval] = complexMul(Input[i + j * Unit], tukey[i]);
		}
	}
}

__device__ cuFloatComplex complexMul(cuFloatComplex a, float constant) 
{
	cuFloatComplex result;
	result.x = a.x * constant;
	result.y = a.y * constant;
	return result;
}

That is, i used 1-Dimension index of threads, and the answer was right. complexMul is a function i used to calculate a constant times complex number. But I still thought it was not fast enough, so I rewrote this function into 2-Dimension:

__global__ void GetOverlapData(cuFloatComplex* Input, cuFloatComplex* Output, float* tukey, int Unit, int Interval, int num)
{
	int row = threadIdx.y + blockDim.y * blockIdx.y;
	int col = threadIdx.x + blockDim.x * blockIdx.x;

	if (row < num && col < Unit)
	{
		Output[row * Interval + col] = complexMul(Input[row * Unit + col], tukey[Unit]);	
	}
}

Then there occurred problems, i can’t get right answer. And i tried to copy data in device to host and print them, there are some right data, but most data were zeros.

Topic		Replies	Views
Trigonometric Functions and Integer Parameters CUDA Programming and Performance	6	1510	February 28, 2012
Common Runtime Math Function Not Working CUDA Programming and Performance	3	8410	March 30, 2008
Strange behavior of cosf function (possible bug ?) CUDA Programming and Performance	13	2282	March 6, 2013
please help me on Hardware mathematic functions CUDA Programming and Performance	3	10730	December 21, 2010
__device__ and __host__ qualifiers in same function CUDA Programming and Performance	4	3311	February 20, 2012
how to call math.h cos from cu file need math.h cos in cu file not in kernel CUDA Programming and Performance	11	12437	April 14, 2009
trigonometric functions standard c v/s cuda CUDA Programming and Performance	13	6001	October 25, 2015
Calling __host__ function from __global__ function CUDA Programming and Performance	5	2579	October 12, 2021
[SOLVED] Crash with CUDA math trigonometric functions and OptiX OptiX	14	3100	June 14, 2022
Trigonometric Functions in CUDA CUDA Programming and Performance	2	22470	March 3, 2008

Why when I tried to use "cosf" function in CUDA, there ocurred errors?

Related topics