The total horror show of dealing with NVidia CUDA is that code which works and produces good results can be a disaster to compile. Why? Because nvcc is a horror show. A nightmare. I wrote some code that simply does a bit of math that we have seen since the early 1980's. Easy stuff. The good old Mandelbrot is always fun. We should all be thankful to Scientific American for the article ( Aug 1985 ) that was great joy to those of us with a mainframe handy. So it seemed clear to me that any decent GPU from NVidia can crank the numbers on this in a jiffy. Well, after some hacking and testing it was clear that I am correct. The GPU will rip the math faster than anything I have ever seen. This includes modern IBM POWER9 killers. A pile of people have run this code stuff and they can produce good stable numerical data. That is fancy sales people talk for "it works". The numbers make sense. Certainly when we plot them onto a pretty colour screen. Hey, lets be honest, the original FORTRAN code would crank out the image onto a line printer. So many trees lost to that way back in the '80s. However, the joy goes away when I try to make the code into little easy bits we can all chew. There is a ( poorly written ) article that suggests we can tear code back down into little pieces and then compile objects and link later : Separate Compilation and Linking of CUDA C++ Device Code https://developer.nvidia.com/blog/separate-compilation-linking-cuda-device-code/ Well guess what ? That code and makefile are history. Nope, they will not work anymore. Easy to fix : https://git.sr.ht/~blastwave/bw/tree/bw/item/nvidia_cuda/separate_compile_link/Makefile OKay, not so easy. Suffice it to say that I wish I have the GV100 killer or the all singing and dancing Ada Lovelace. Who was in the room when the marketing and sales people made that name? Defining a dramatically higher baseline of GPU performance, it marks the tipping point for AI, ray tracing, and neural graphics. Everyone goes dancing ... except the code will not compile. So the code that was working could crank all the numbers and create an output file of data quite neatly. I gave up on the NVidia CUDA Compile beer magic and tossed it all into a single dumb file : https://git.sr.ht/~blastwave/bw/tree/bw/item/nvidia_cuda/mandel_hack/one_big.cu T H A T W O R K S It cranks the numbers and creates an output file that matches perfect SHA512 hash on IBM POWER9 and any damn thing you have. The math is clean. That data can be checked. It even plots to make pretty pictures. Feel free to run the stuff at : https://git.sr.ht/~blastwave/bw/tree/bw/item/xmand Regardless I did break out the code into little easy pieces : https://git.sr.ht/~blastwave/bw/tree/bw/item/nvidia_cuda/mandel_hack However there is no EASY way to compile and link. Why ? What is going on here : titan$ ./do.sh gpu_mbrot.cu(22): warning #20039-D: a __host__ function("gpu_mbrot") redeclared with __device__, hence treated as a __host__ __device__ function gpu_mbrot.cu(22): warning #20039-D: a __host__ function("gpu_mbrot") redeclared with __device__, hence treated as a __host__ __device__ function main.cu(446): error: a host function call cannot be configured main.cu(617): warning #550-D: variable "err_status" was set but never used 1 error detected in the compilation of "main.cu". titan$ What does that mean? What is going on here ? https://www.genunix.com/dclarke/nvidia_cuda_wat.txt -- Dennis Clarke RISC-V/SPARC/PPC/ARM/CISC UNIX and Linux spoken PS: I gave up on a Makefile
In mand.h, the function is declared as implicit
void gpu_mbrot( const double *c_r, const double *c_i, uint32_t *mval, int num_elements );
in gpu_mbrot.cu, it is declared as
__device__ void gpu_mbrot( const double *c_r, const double *c_i, uint32_t *mval, int num_elements )
in main.cu, it is assumed a
__global__ function, which it is not.
gpu_mbrot<<<blocksPerGrid, threadsPerBlock>>>( device_r, device_j, device_mval, num_elements );
That won’t work, and is reported by you nvcc compiler.
okay so what is the deal here ? perhaps in mand.h we need to say __device__ there ? stick " __device__ " inside mand.h and then boom : $ nvcc -gencode arch=compute_35,code=sm_35 -gencode arch=compute_60,code=sm_60 -Wno-deprecated-gpu-targets --ftz=false --prec-div=true --prec-sqrt=true --fmad=true -c -o array_offset.o array_offset.c In file included from array_offset.c:3: mand.h:55:11: error: expected ‘;’ before ‘void’ 55 | __device__ void gpu_mbrot( const double *c_r, const double *c_i, uint32_t *mval, int num_elements ); | ^~~~~ | ; $ so that will not work. Perhaps you mean something else?
I won’t track this issue down for you.
__device__ functions cannot be executed from the host anyways, you need to use
If you are not familiar with CUDA, I would suggest reading the official programming guide CUDA C Programming Guide or other training material, and start with a simple “hello world” program to better understand the different types of annotations, rather than a complex multi-file project.
The code works just fine. I can slam all the sources into one big dumb file and it compile and runs perfectly. The only appears when I need to "decorate" the code bits with the NVidia CUDA special spices in order for the NVidia CUDA Compiler to not choke. Which it does. In any case I will go back and look at the cute flavour decorations that NVidia wants. Either that or NVidia just wants more money to sell me something.