I’m trying to use preprocessor definition CUDA_ARCH to tell wheter we are currently compiling for the host or the device (See programming guide), and I just can’t get it to work.
Of course the real use case is more complex, but this minimal use-case shows the problem: CUDA_ARCH is defined in a function that is compiled for host only!
What is going on here? CUDA C programming guide clearly states:
"The CUDA_ARCH macro can be used to differentiate various code paths based on
compute capability. It is only defined for device code. When compiling with
“arch=compute_11†for example, CUDA_ARCH is equal to 110."
Am I missing something here? It’s pretty basic and I really do need it to work…
int main (void)
{
#ifndef __CUDA_ARCH__
#warning __CUDA_ARCH__ is undefined!
#endif
#ifdef __CUDA_ARCH__
#warning __CUDA_ARCH__ is defined!
#endif
return 0;
}
[font=“Courier New”]> nvcc test2.cu
test2.cu:7:2: warning: #warningCUDA_ARCH is defined!
test2.cu:4:2: warning: #warningCUDA_ARCH is undefined!
[/font]
(also note the order of the warnings)
.cu files get compiled twice, once for the device and once for the host. And preprocessor macros are expanded at compile time, so for this case it does not matter where they are placed within the file. If however you use #error, the compilation process stops after the first compilation with errors, so you would only see
Hold on, why is a function, which is not callable from device, compiled twice? No wonder nvcc seems slow.
Also as stated in the programming guide: “It is only defined for device code.” This is a CLEAR violation of that statement: the function is not device code.
As I said, this is (of course) not my use case, but a simpler one to show the problem. The real use case (and problem) is this:
test.cu
__host__ __device__ void fun (void)
{
#ifdef __CUDA_ARCH__
// Here I have code path for my device code
__shared__ int i;
#else
// Here I have code path for my host code
int i;
#endif
}
int main (void)
{
return 0;
}
nvcc test.cu:
and output:
"
test.cu(8): error: a function scope variable cannot be declared with “shared” inside a host function
test.cu(8): warning: variable “i” was declared but never referenced
test.cu(15): warning: return type of function “main” must be “int”
1 error detected in the compilation of “/tmp/tmpxft_00001625_00000000-4_test.cpp1.ii”.
"
Or is the compiler complaining that I am using shared in device code, for a function that has host tag as well? How silly would that be?
But still, I cannot have distinct code-paths for device and host functions.
Or I’m somehow misusing the macro.
Ok - actually that is the case - workaround is here:
test.cu:
#ifdef __CUDA_ARCH__
__device__ void fun (void)
#else
__host__ void fun (void)
#endif
{
#ifdef __CUDA_ARCH__
// Here I have code path for my device code
__shared__ int i;
#else
// Here I have code path for my host code
int i;
#endif
}
int main (void)
{
return 0;
}
How beautiful is that? :)
I guess I’ll just write my own preprocessor macro: