Main.cu(446): error: a host function call cannot be configured

dc_vector · October 30, 2023, 3:12am


The total horror show of dealing with NVidia CUDA is that code which works
and produces good results can be a disaster to compile. Why? Because nvcc
is a horror show. A nightmare.

I wrote some code that simply does a bit of math that we have seen since
the early 1980's. Easy stuff. The good old Mandelbrot is always fun. We
should all be thankful to Scientific American for the article ( Aug 1985 ) that
was great joy to those of us with a mainframe handy.

So it seemed clear to me that any decent GPU from NVidia can crank the
numbers on this in a jiffy. Well, after some hacking and testing it was clear
that I am correct. The GPU will rip the math faster than anything I have
ever seen. This includes modern IBM POWER9 killers.

A pile of people have run this code stuff and they can produce good stable
numerical data. That is fancy sales people talk for "it works". The numbers
make sense. Certainly when we plot them onto a pretty colour screen. Hey,
lets be honest, the original FORTRAN code would crank out the image onto
a line printer. So many trees lost to that way back in the '80s.

However, the joy goes away when I try to make the code into little easy bits
we can all chew.  There is a ( poorly written ) article that suggests we can
tear code back down into little pieces and then compile objects and link
later : 

Separate Compilation and Linking of CUDA C++ Device Code
https://developer.nvidia.com/blog/separate-compilation-linking-cuda-device-code/

Well guess what ? 

That code and makefile are history. Nope, they will not work anymore.

Easy to fix : 

https://git.sr.ht/~blastwave/bw/tree/bw/item/nvidia_cuda/separate_compile_link/Makefile

OKay, not so easy. Suffice it to say that I wish I have the GV100 killer or
the all singing and dancing Ada Lovelace. Who was in the room when the
marketing and sales people made that name? Defining a dramatically higher
baseline of GPU performance, it marks the tipping point for AI, ray tracing,
and neural graphics.  Everyone goes dancing ... except the code will not 
compile.

So the code that was working could crank all the numbers and create an
output file of data quite neatly. I gave up on the NVidia CUDA Compile beer
magic and tossed it all into a single dumb file : 

https://git.sr.ht/~blastwave/bw/tree/bw/item/nvidia_cuda/mandel_hack/one_big.cu

            T H A T    W O R K S

It cranks the numbers and creates an output file that matches perfect
SHA512 hash on IBM POWER9 and any damn thing you have. The math
is clean. That data can be checked. It even plots to make pretty pictures.

Feel free to run the stuff at : 

    https://git.sr.ht/~blastwave/bw/tree/bw/item/xmand

Regardless I did break out the code into little easy pieces : 

    https://git.sr.ht/~blastwave/bw/tree/bw/item/nvidia_cuda/mandel_hack

However there is no EASY way to compile and link. Why ? What is going
on here : 

titan$ ./do.sh 
gpu_mbrot.cu(22): warning #20039-D: a __host__ function("gpu_mbrot") redeclared with __device__, hence treated as a __host__ __device__ function

gpu_mbrot.cu(22): warning #20039-D: a __host__ function("gpu_mbrot") redeclared with __device__, hence treated as a __host__ __device__ function

main.cu(446): error: a host function call cannot be configured

main.cu(617): warning #550-D: variable "err_status" was set but never used

1 error detected in the compilation of "main.cu".
titan$

What does that mean?

What is going on here ? 

https://www.genunix.com/dclarke/nvidia_cuda_wat.txt

--
Dennis Clarke
RISC-V/SPARC/PPC/ARM/CISC
UNIX and Linux spoken

PS: I gave up on a Makefile

striker159 · October 30, 2023, 5:29am

In mand.h, the function is declared as implicit __host__ function.

void gpu_mbrot( const double *c_r, const double *c_i, uint32_t *mval, int num_elements );

in gpu_mbrot.cu, it is declared as __device__ function

__device__ 
void gpu_mbrot( const double *c_r, const double *c_i, uint32_t *mval, int num_elements )

in main.cu, it is assumed a __global__ function, which it is not.

gpu_mbrot<<<blocksPerGrid, threadsPerBlock>>>( device_r, device_j, device_mval, num_elements );

That won’t work, and is reported by you nvcc compiler.

dc_vector · October 30, 2023, 7:06am

okay so what is the deal here ? 

perhaps in mand.h we need to say __device__ there ? 

stick "  __device__ " inside mand.h and then boom : 

$ nvcc -gencode arch=compute_35,code=sm_35 -gencode arch=compute_60,code=sm_60 -Wno-deprecated-gpu-targets --ftz=false --prec-div=true --prec-sqrt=true --fmad=true -c -o array_offset.o array_offset.c
In file included from array_offset.c:3:
mand.h:55:11: error: expected ‘;’ before ‘void’
   55 | __device__ void gpu_mbrot( const double *c_r, const double *c_i, uint32_t *mval, int num_elements );
      |           ^~~~~
      |           ;
$ 

so that will not work. 

Perhaps you mean something else?

striker159 · October 30, 2023, 7:45am

I won’t track this issue down for you. __device__ functions cannot be executed from the host anyways, you need to use __global__.

If you are not familiar with CUDA, I would suggest reading the official programming guide CUDA C Programming Guide or other training material, and start with a simple “hello world” program to better understand the different types of annotations, rather than a complex multi-file project.

dc_vector · November 2, 2023, 9:00am


The code works just fine. 

I can slam all the sources into one big dumb file and it compile
and runs perfectly.

The only appears when I need to "decorate" the code bits with the
NVidia CUDA special spices in order for the NVidia CUDA Compiler to
not choke. Which it does.

In any case I will go back and look at the cute flavour decorations that
NVidia wants. Either that or NVidia just wants more money to sell me
something.

Topic		Replies	Views
Tiny but painful error with host/device function __device__ cannot be called from __host__ __device_ CUDA Programming and Performance	7	11243	September 8, 2010
No compile error when calling device-only method from templated class on host, using nvcc CUDA Programming and Performance cuda , compile	2	533	August 13, 2024
NVCC forces c++ compilation of .cu files CUDA Programming and Performance	11	26176	December 11, 2011
Compiling __host__ __device__ functions CUDA Programming and Performance	1	3473	August 5, 2011
Error: kernel launch from __device__ or __global__ functions requires separate compilation mode CUDA Programming and Performance cuda	8	3338	November 6, 2023
Problems calling __device__ function in __host__ __device__ function CUDA Programming and Performance	4	2893	April 9, 2013
__host__ and __device__ qualifies CUDA Programming and Performance	1	4472	February 13, 2010
Calling device function returns "calling a host function..." error CUDA Programming and Performance	6	6623	August 23, 2010
Linker errors for "__host__ __device__ functions" CUDA Programming and Performance	1	1291	October 12, 2008
Calling a host function from within the device CUDA Programming and Performance	4	2436	April 13, 2016

Main.cu(446): error: a host function call cannot be configured

Related topics