with nvcc or gcc (running Ubuntu 10.04) okay, if the file is named myfile.c, things fall apart for compiling this same thing as a global function in a file named as myfile.cu
global void myfunction()
{
int gcd(int a) {
int b=a;
return b;
}
}
and nvcc keeps asking for a missing ; on the int gcd line, expecting a function prototype? If I prototype it, then it says that I cannot call a host function from a global function.
So just how do we get functions working inside the one big global program for source files of the type .cu ?
global functions are functions running on the device that are called from the host. A global function can call as many device functions as it desires. An unadorned function name is assumed to refer to a host function, and host functions cannot be invoked from the device. This is why in your code the call to gcd() is flagged as an error.
You might want to read the CUDA Programming Guide and look at the example apps in the CUDA SDK.
You are correct and I did read version 4 of the programming guide and did compile all the example applications in the CUDA SDK (latest) Furthermore I started playing around with the Makefile to see exactly what was happening there. And I did try use the device attribute in front of the function forward declaration and the function itself, but still cannot get the make or nvcc to compile.
I did notice that cppIntegration does use an external function inside the global body but it uses the “C” extern style and the function is located inside a cpp code unit. I did try this method of construction, but it too failed.
I also did notice that when lots of functions were compiled… the compiler took about an hour to run, then came up to a fatal message external call to CUPRINTF, even though that is a very legitimate working CUDA function. I am not sure why the compiler is saying something is illegal when it obviously is not.
So I ripped out all the CUPRINTF code, realizing that the compiler was going to flag it anyway and reject it and I didn’t want to take that risk.
My goal is to simply get a function inside a global function to work and compile.
And yes, I did read the guide and made the examples and ran them and even took one apart and modified the <<<>>> call to see exactly what was going on there. And yes, I ran 1000 threads on one device.
Are there any compiling guides?? I did a cursory check through all the pdf files, and corrected the main documentation.html file in the SDK so that it correctly references the correct location for the pdf files cited.
I am not sure where you are stuck. Here is a minimalist example that may help:
#include <stdio.h>
#include <stdlib.h>
__device__ float square (float a);
__global__ void sqr_kernel (float x, float *res);
int main (void)
{
float arg, res = 0.0f;
float *res_d;
if (cudaSuccess != cudaMalloc ((void**)&res_d, sizeof(res_d[0]))) {
fprintf (stderr, "!!!! allocation if res_d failed\n");
return EXIT_FAILURE;
}
arg = 5.0f;
sqr_kernel<<<1,1>>>(arg, res_d);
if (cudaSuccess !=
cudaMemcpy (&res, res_d, sizeof(res), cudaMemcpyDeviceToHost)) {
fprintf (stderr, "!!!! device->host copy of result failed\n");
return EXIT_FAILURE;
}
printf ("The square of %.f is %.f\n", arg, res);
if (cudaSuccess != cudaFree (res_d)) {
fprintf (stderr, "!!!! de-allocation of res_d failed\n");
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
__global__ void sqr_kernel (float x, float *res)
{
*res = square (x);
}
__device__ float square (float a)
{
return a * a;
}
I put the above in a file test.cu and compiled with
nvcc -o test test.cu
When run, the program prints
The square of 5 is 25
[Later:]
Looking at your original code again, it seems what you are trying to do is define a function nested inside another function. This is not supported by standard C and standard C++, and thus not supported by CUDA. gcc may allow nested functions in C/C++ as a proprietary extension, but this is not something I am familiar with.
Here’s what I did. I had to convert all the functions bodies to cu type, as well as all the header files. Inside each function, at the definition line, I prefaced device which made that function available to the global function calling them.
Next I had to fix the duplicate issues of functions while compiling, so I had to use the traditional #ifndefPROGRAM_FILE_NAME for the pre-compiler processor, this fixed the issues with functions that had the same .cuh header file reference in them.
Then I still couldn’t get things to work, but noticed that perhaps changing the include file name from “myfile.cuh” to “myfile.cu” in the cude file calling that function might actually fix the problem with external references. This DID fix the problem, apparently the nvcc compiler wants all the visibility to be at once, rather than using the header file. I am not sure how scoping is accomplished during the compile phase.
It would be nice to add a section in the Programming Guide on compiling and linking and how to call functions and such as this doesn’t seem too visible.
For example, I did a simple cut and paste from the programming guide, but it wouldn’t compile. It was only when I moved inside the SDK and started using the Makefile did things finally compile.
Quote:
“Looking at your original code again, it seems what you are trying to do is define a function nested inside another function. This is not supported by standard C and standard C++, and thus not supported by CUDA. gcc may allow nested functions in C/C++ as a proprietary extension, but this is not something I am familiar with.”
Apparently gcc allows this, as it passes without complaint.
But yes, you are correct that this is not supported by standard C and C++, I only tried this construction while trying to get the main.cu file to compile and had put everything into one huge file (thousands of lines long) since I was still learning about the nuances of the nvcc compile and link.
Right now I have all the code broken up into modules as usual practice and nvcc likes that much better.