Is __CUDA_ARCH__ broken?

maxpower3141 · June 10, 2011, 8:28am

I’m trying to use preprocessor definition CUDA_ARCH to tell wheter we are currently compiling for the host or the device (See programming guide), and I just can’t get it to work.

My minimal case is this:

test.cu:

void main (void)

{

#ifdef __CUDA_ARCH__

#if (__CUDA_ARCH__  == 100)

#error CUDA_ARCH == 100

#endif

#error CUDA ARCH BROKEN!

#endif

}

nvcc test.cu →

test.cu:5:2: error: #error CUDA_ARCH == 100

test.cu:7:2: error: #error CUDA ARCH BROKEN!

Versions:

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Built on Wed_Nov__3_16:16:57_PDT_2010

Cuda compilation tools, release 3.2, V0.2.1221

and

nvcc: NVIDIA (R) Cuda compiler driver

Built on Thu_May_12_11:09:45_PDT_2011

Cuda compilation tools, release 4.0, V0.2.1221

Of course the real use case is more complex, but this minimal use-case shows the problem: CUDA_ARCH is defined in a function that is compiled for host only!

What is going on here? CUDA C programming guide clearly states:

"The CUDA_ARCH macro can be used to differentiate various code paths based on

compute capability. It is only defined for device code. When compiling with

â€œarch=compute_11â€ for example, CUDA_ARCH is equal to 110."

Am I missing something here? It’s pretty basic and I really do need it to work…

avidday · June 10, 2011, 9:31am

How about an #else statement in their somewhere?

avidday@cuda:~$ cat cuda_arch.c

void main (void)

{

#ifdef __CUDA_ARCH__

#if (__CUDA_ARCH__  == 100)

#error CUDA_ARCH == 100

#else

#error CUDA ARCH BROKEN!

#endif

#endif

}

avidday@cuda:~$ ln -s cuda_arch.cu cuda_arch.c

avidday@cuda:~$ nvcc -arch=sm_10 -c cuda_arch.cu

cuda_arch.cu:6:2: error: #error CUDA_ARCH == 100

avidday@cuda:~$ nvcc -arch=sm_11 -c cuda_arch.cu

cuda_arch.cu:8:2: error: #error CUDA ARCH BROKEN!

avidday@cuda:~$ nvcc -c cuda_arch.c

Look fine to me (CUDA 3.2, Linux 64 bit, gcc 4.4.3).

tera · June 10, 2011, 10:09am

That behavior is not an error, so you should use #warning: External Image

int main (void)

{

#ifndef __CUDA_ARCH__

#warning CUDA ARCH WORKS!

#endif

    return 0;

}

[font=“Courier New”]> nvcc test.cu

test.cu:4:2: warning: #warning CUDA ARCH WORKS!

[/font]

or even

int main (void)

{

#ifndef __CUDA_ARCH__

#warning __CUDA_ARCH__ is undefined!

#endif

#ifdef __CUDA_ARCH__

#warning __CUDA_ARCH__ is defined!

#endif

    return 0;

}

[font=“Courier New”]> nvcc test2.cu

test2.cu:7:2: warning: #warning CUDA_ARCH is defined!

test2.cu:4:2: warning: #warning CUDA_ARCH is undefined!

[/font]

(also note the order of the warnings)

.cu files get compiled twice, once for the device and once for the host. And preprocessor macros are expanded at compile time, so for this case it does not matter where they are placed within the file. If however you use #error, the compilation process stops after the first compilation with errors, so you would only see

[font=“Courier New”]> nvcc test3.cu

test3.cu:7:2: error: #error CUDA_ARCH is defined!

[/font]

before compilation is aborted.

maxpower3141 · June 10, 2011, 10:54am

That behavior is not an error, so you should use #warning: External Image
int main (void)

{

#ifndef __CUDA_ARCH__

#warning CUDA ARCH WORKS!

#endif

    return 0;

}
[font=“Courier New”]> nvcc test.cu

test.cu:4:2: warning: warning CUDA ARCH WORKS!

[/font]

or even
int main (void)

{

#ifndef __CUDA_ARCH__

#warning __CUDA_ARCH__ is undefined!

#endif

#ifdef __CUDA_ARCH__

#warning __CUDA_ARCH__ is defined!

#endif

    return 0;

}
[font=“Courier New”]> nvcc test2.cu

test2.cu:7:2: warning: warning CUDA_ARCH is defined!

test2.cu:4:2: warning: warning CUDA_ARCH is undefined!

[/font]

(also note the order of the warnings)

.cu files get compiled twice, once for the device and once for the host. And preprocessor macros are expanded at compile time, so for this case it does not matter where they are placed within the file. If however you use #error, the compilation process stops after the first compilation with errors, so you would only see

[font=“Courier New”]> nvcc test3.cu

test3.cu:7:2: error: #error CUDA_ARCH is defined!

[/font]

before compilation is aborted.

Hold on, why is a function, which is not callable from device, compiled twice? No wonder nvcc seems slow.

Also as stated in the programming guide: “It is only defined for device code.” This is a CLEAR violation of that statement: the function is not device code.

As I said, this is (of course) not my use case, but a simpler one to show the problem. The real use case (and problem) is this:

test.cu

__host__ __device__ void fun (void)

{

#ifdef __CUDA_ARCH__

// Here I have code path for my device code

__shared__ int i;

#else

// Here I have code path for my host code

int i;

#endif

}

int main (void)

{

 return 0;

}

nvcc test.cu:

and output:

"

test.cu(8): error: a function scope variable cannot be declared with “shared” inside a host function

test.cu(8): warning: variable “i” was declared but never referenced

test.cu(15): warning: return type of function “main” must be “int”

1 error detected in the compilation of “/tmp/tmpxft_00001625_00000000-4_test.cpp1.ii”.

"

Or is the compiler complaining that I am using shared in device code, for a function that has host tag as well? How silly would that be?

But still, I cannot have distinct code-paths for device and host functions.

Or I’m somehow misusing the macro.

Ok - actually that is the case - workaround is here:

test.cu:

#ifdef __CUDA_ARCH__

__device__ void fun (void)

#else

__host__ void fun (void)

#endif

{

#ifdef __CUDA_ARCH__

// Here I have code path for my device code

__shared__ int i;

#else

// Here I have code path for my host code

int i;

#endif

}

int main (void)

{

 return 0;

}

How beautiful is that? :)

I guess I’ll just write my own preprocessor macro:

ifdef CUDA_ARCH

define MY_DEV_FUN device

else

define MY_DEV_FUN host

endif

and use that instead of host device.

Oh well, somewhat unintuitive for me, but luckily workaround (for the silliness) exists.

Topic		Replies	Views
__CUDA_ARCH__ undefined by NVCC on CUDA 3.2 RC CUDA Programming and Performance	15	3750	November 26, 2010
__CUDA_ARCH__ undefined?! CUDA Programming and Performance	10	20442	April 9, 2012
Compilation problem: CUDA is broken? CUDA Programming and Performance	11	13705	September 14, 2011
Problem compiling basic programs CUDA Programming and Performance	7	29926	February 15, 2011
__CUDA_ARCH__ in object methods not working CUDA Programming and Performance	3	1086	October 30, 2019
Nvcc(cuda 11.6) compiled failed: __hmax undefined CUDA NVCC Compiler	15	1035	August 18, 2023
'cicc' compilation error and debug flag CUDA Programming and Performance	25	14237	May 23, 2023
[SOLVED] Code not compiling for mysterious reason CUDA Programming and Performance	3	5592	December 5, 2017
Unable to compile CUDA file CUDA Setup and Installation	9	10223	May 19, 2017
nvcc and googletest CUDA Programming and Performance	5	16347	July 7, 2011

Is __CUDA_ARCH__ broken?

Related topics