Is __CUDA_ARCH__ broken?

I’m trying to use preprocessor definition CUDA_ARCH to tell wheter we are currently compiling for the host or the device (See programming guide), and I just can’t get it to work.

My minimal case is this:

test.cu:

void main (void)

{

#ifdef __CUDA_ARCH__

#if (__CUDA_ARCH__  == 100)

#error CUDA_ARCH == 100

#endif

#error CUDA ARCH BROKEN!

#endif

}

nvcc test.cu →

test.cu:5:2: error: #error CUDA_ARCH == 100

test.cu:7:2: error: #error CUDA ARCH BROKEN!

Versions:

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2010 NVIDIA Corporation

Built on Wed_Nov__3_16:16:57_PDT_2010

Cuda compilation tools, release 3.2, V0.2.1221

and

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2011 NVIDIA Corporation

Built on Thu_May_12_11:09:45_PDT_2011

Cuda compilation tools, release 4.0, V0.2.1221

Of course the real use case is more complex, but this minimal use-case shows the problem: CUDA_ARCH is defined in a function that is compiled for host only!

What is going on here? CUDA C programming guide clearly states:

"The CUDA_ARCH macro can be used to differentiate various code paths based on

compute capability. It is only defined for device code. When compiling with

“arch=compute_11” for example, CUDA_ARCH is equal to 110."

Am I missing something here? It’s pretty basic and I really do need it to work…

How about an #else statement in their somewhere?

avidday@cuda:~$ cat cuda_arch.c

void main (void)

{

#ifdef __CUDA_ARCH__

#if (__CUDA_ARCH__  == 100)

#error CUDA_ARCH == 100

#else

#error CUDA ARCH BROKEN!

#endif

#endif

}

avidday@cuda:~$ ln -s cuda_arch.cu cuda_arch.c

avidday@cuda:~$ nvcc -arch=sm_10 -c cuda_arch.cu

cuda_arch.cu:6:2: error: #error CUDA_ARCH == 100

avidday@cuda:~$ nvcc -arch=sm_11 -c cuda_arch.cu

cuda_arch.cu:8:2: error: #error CUDA ARCH BROKEN!

avidday@cuda:~$ nvcc -c cuda_arch.c

Look fine to me (CUDA 3.2, Linux 64 bit, gcc 4.4.3).

That behavior is not an error, so you should use #warning: External Image

int main (void)

{

#ifndef __CUDA_ARCH__

#warning CUDA ARCH WORKS!

#endif

    return 0;

}

[font=“Courier New”]> nvcc test.cu

test.cu:4:2: warning: #warning CUDA ARCH WORKS!

[/font]

or even

int main (void)

{

#ifndef __CUDA_ARCH__

#warning __CUDA_ARCH__ is undefined!

#endif

#ifdef __CUDA_ARCH__

#warning __CUDA_ARCH__ is defined!

#endif

    return 0;

}

[font=“Courier New”]> nvcc test2.cu

test2.cu:7:2: warning: #warning CUDA_ARCH is defined!

test2.cu:4:2: warning: #warning CUDA_ARCH is undefined!

[/font]

(also note the order of the warnings)

.cu files get compiled twice, once for the device and once for the host. And preprocessor macros are expanded at compile time, so for this case it does not matter where they are placed within the file. If however you use #error, the compilation process stops after the first compilation with errors, so you would only see

[font=“Courier New”]> nvcc test3.cu

test3.cu:7:2: error: #error CUDA_ARCH is defined!

[/font]

before compilation is aborted.

Hold on, why is a function, which is not callable from device, compiled twice? No wonder nvcc seems slow.

Also as stated in the programming guide: “It is only defined for device code.” This is a CLEAR violation of that statement: the function is not device code.

As I said, this is (of course) not my use case, but a simpler one to show the problem. The real use case (and problem) is this:

test.cu

__host__ __device__ void fun (void)

{

#ifdef __CUDA_ARCH__

// Here I have code path for my device code

__shared__ int i;

#else

// Here I have code path for my host code

int i;

#endif

}

int main (void)

{

 return 0;

}

nvcc test.cu:

and output:

"

test.cu(8): error: a function scope variable cannot be declared with “shared” inside a host function

test.cu(8): warning: variable “i” was declared but never referenced

test.cu(15): warning: return type of function “main” must be “int”

1 error detected in the compilation of “/tmp/tmpxft_00001625_00000000-4_test.cpp1.ii”.

"

Or is the compiler complaining that I am using shared in device code, for a function that has host tag as well? How silly would that be?

But still, I cannot have distinct code-paths for device and host functions.

Or I’m somehow misusing the macro.

Ok - actually that is the case - workaround is here:

test.cu:

#ifdef __CUDA_ARCH__

__device__ void fun (void)

#else

__host__ void fun (void)

#endif

{

#ifdef __CUDA_ARCH__

// Here I have code path for my device code

__shared__ int i;

#else

// Here I have code path for my host code

int i;

#endif

}

int main (void)

{

 return 0;

}

How beautiful is that? :)

I guess I’ll just write my own preprocessor macro:

ifdef CUDA_ARCH

define MY_DEV_FUN device

else

define MY_DEV_FUN host

endif

and use that instead of host device.

Oh well, somewhat unintuitive for me, but luckily workaround (for the silliness) exists.