undefined reference to `cyl_bessel_i0f'

When linking CUDA code, I’m getting the following error:

test_cyl_bessel_i0f.o: In function main': tmpxft_00007f3f_00000000-4_test_cyl_bessel_i0f.cudafe1.cpp:(.text+0x26): undefined reference to cyl_bessel_i0f’
collect2: error: ld returned 1 exit status

I’m using the following commands to compile and link the code:

nvcc -I/usr/local/cuda/include -c test_cyl_bessel_i0f.cu
nvcc -L/usr/local/cuda/lib64 -o test_cyl_bessel_i0f test_cyl_bessel_i0f.o -lcudart

The example program is

#include <stdio.h>

#include <math_functions.h>

int main(void)
	float a;
	a = cyl_bessel_i0f(0.5f);
	printf("%f\n", a);

	return 0;

I am using CUDA 7.5.

You are using cyl_bessel_i0f() in host code. nvcc passes host code to the host compiler, and the resulting object code is linked with host libraries.

cyl_bessel_i0f() is a function supported by the CUDA standard math library in device code. The CUDA math library comprises the standard set of C/C++ math library functions, plus some additional useful functions, such as rsqrt(), erfinv(), erfcinv(), norm3d(), and cyl_bessel_i0f().

You might want to search available host libraries, such as GSL or Boost, for an implementation of the modified Bessel function of the first kind of order 0.

Thanks for the reply. I thought though that functions such as cyl_bessel_i0f and friends could be called from host code, because they have the host directive in the definition (see link below). Have I misunderstood the effect of the host directive?


Interesting. Your understanding of the host attribute is correct. This looks like a documentation bug best I can tell, I am guessing a cut & paste issue. I looked at relevant CUDA 7.5 header files and see no indication that host-side use of cyl_bessel_i0f() is actually supported.

I would suggest filing a bug report with NVIDIA. The bug reporting form is linked from the CUDA registered developer website. Login here: https://developer.nvidia.com/

I see the following in /usr/local/cuda/include/math_functions.h:

extern __host__ __device__ __device_builtin__ float                  cyl_bessel_i0f(float x) __THROW;


__MATH_FUNCTIONS_DECL__ float cyl_bessel_i0f(float a) __THROW;


#if defined(__CUDACC_RTC__)
#define __MATH_FUNCTIONS_DECL__ __host__ __device__
#else /* __CUDACC_RTC__ */
#define __MATH_FUNCTIONS_DECL__ static inline __host__ __device__
#endif /* __CUDACC_RTC__ */

Based on this, it looks like host side calling is supported, isn’t it?

But there is no host implementation in that header file, and obviously the function is not part of the host’s math library, so where would the host compiler get the code for cyl_bessel_i0f() from?

Comparing to other prototypes, this one also contains __THROW. While I do not know the significance of this, it sets it apart from other functions that are actually supported on both host and device and don’t have __THROW.

It probably is not fruitful trying to understand the details of this code. If you file a bug report with NVIDIA they should be able to tell you whether the function is supposed to be supported on the host, but host-side support was erroneously omitted, or whether the function is definitely not supported on the host and this is a documentation bug.

The GNU Scientific Library offers the modified Bessel functions of the first kind:

The Boost library offers them as well:

Thanks for the help. I’ll submit a bug report to NVIDIA.

In the meantime, I’m using your implementation of the modified bessel function in your post:



As I recall, that was a double-precision implementation though, so don’t expect any speed records. While the posted code was not fully refined, it was tested well enough that I would not expect you to run into any issues. Out of curiosity, what do you need the modified Bessel functions for?

I made a single precision version of the function by blindly converting doubles to floats. The single precision version seems to give the same answer as the IDL [1] version.

I need the modified Bessel function for a NUFFT-based interpolatation scheme [2].

[1] http://www.exelisvis.com/docs/BESELI.html
[2] http://www.jpier.org/PIER/pier.php?paper=12071909

A simple single-precision wrapper function around the previously posted double-precision code should deliver the correctly rounded single-precision result 99.9999% of the time. So that’s a perfectly fine approach, just not very fast.

I could not locate any direct mention of the modified Bessel function I_0() in the referenced paper, but I guess its use is implied by the use of the Kaiser-Bessel window. Interesting stuff, this NUFFT. But why do the Bessel computation on the host side of a CUDA program? Wouldn’t it be advantageous to roll it into the GPU-side computation? In particular since cyl_bessel_i0f() is definitely supported in device code.

(For a while, the forum wasn’t allowing me to reply to your message. Now, apparently I can again. Odd.)

I’m still figuring out the best way to implement the algorithm. I had thought about precomputing the Kaiser-Bessel filter coefficients on the CPU, and then reusing them in the interpolatation for each batch of data that I have. In this scheme, the Kaiser-Bessel filter would be computed by the CPU, and the convolution sum that uses the transform of the Kaiser Bessel filter would be done on the GPU. I thought that it would be nice to use the same Bessel function routines for the CPU and GPU computations to maintain consistency.

I have submitted a bug report to NVIDIA.

The bug report will definitely help with getting to the bottom of this, so it is good that you filed it. I, too, noticed a problem with the forum in trying to reply to a different thread.

Not sure what degree of consistency you are looking or, but I assume you are aware that even if CUDA were to support cyl_bessel_i0f() in the host portion of code, the results would not be bit-identical to those produced on the GPU, due to the impact of FMA, approximate divisions, etc. used on the GPU to provide the best possible performance.