__CUDA_ARCH__ undefined?!

RezaRob3 · April 6, 2012, 3:38pm

What’s wrong here? I’m using this line to compile:

nvcc -gencode=arch=compute_20,code="sm_20,compute_20" test.cu -o test

and I get “CUDA_ARCH is undefined.”

#include <cuda.h>

#include <stdio.h>

#include <stdlib.h>

__global__ void no_op()

{

    printf("%d\n", __CUDA_ARCH__);

}

int main()

{

    no_op<<<1, 1>>>();

    cudaThreadSynchronize();

return 0;

}

DrAnderson42 · April 6, 2012, 4:07pm

Very strange. I use CUDA_ARCH without any problems in a large project.
But your minimal example fails! (I tested it to make sure).

RezaRob3 · April 6, 2012, 4:14pm

Gee, really?!! Maybe I should be updating my toolkit? I’m getting:

# nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2011 NVIDIA Corporation

Built on Sun_Mar_20_16:47:09_PDT_2011

Cuda compilation tools, release 4.0, V0.2.1221

EDIT: it’s just unreal, because everything else seems to work with this compiler! For example, inline assembly.

RezaRob3 · April 6, 2012, 4:39pm

It is sometimes defined, but not in device code! If I add these three compiler directives, I get the warning message!

#if __CUDA_ARCH__ == 200

#warning "hello"

#endif

__global__ void no_op()

{

    printf("%d\n", __CUDA_ARCH__);

}

tera · April 6, 2012, 5:43pm

The message “CUDA_ARCH is undefined.” is emitted by [font=“Courier New”]cudafe++[/font] (the program that splits host and device code), not by the host compiler. In order to figure out where kernels and device functions end, it needs to completely parse the device routines even when it extracts the host code. And CUDA_ARCH is indeed undefined in host code.

The solution is simple, even though it may look silly first: Just protect your printf statement with a [font=“Courier New”]#ifdef CUDA_ARCH[/font]. It won’t change the generated code, but it allows parsing of the host code (where this statement will be dropped at a later stage later anyway) to proceed.

njuffa · April 6, 2012, 6:21pm

As far as I understand the compilation process, tera’s explanation is right on the money. As an addendum, one reason CUDA_ARCH is undefined in host code is because for fatbinary compilation targeting multiple device architectures, host code is only compiled once, so it can’t be associated with any particular CUDA architecture.

The recommended way to check for the CUDA architecture in device code is something like this:

#if defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 200)

In general CUDA architecture versions follow an onion-layer model, so the use of architectural features is usually best guarded by >= comparisons against CUDA_ARCH.

RezaRob3 · April 7, 2012, 2:39am

Yes, tera, njuffa, it works exactly as you say.

I also downloaded the latest version of the driver and toolkit, and it still works only that way. It makes me curious what DrAnderson42 meant by having “tested it,” but in any case, it works now.

Many thanks. :)

tera · April 7, 2012, 7:00am

Thanks njuffa, RezaRob3. No Money involved though. External Image I wonder if Nvidia would offer me a job…

RezaRob3 · April 7, 2012, 8:20am

tera, I don’t know you at all personally, but have seen your name pop up here often. I learn from you even when you’re not responding to me directly.

Thank you.:)

EDIT: I hope you get your wish if you love Nvidia that much!

PS: My apology for being OFFTOPIC this once.

njuffa · April 7, 2012, 8:22pm

Equally off topic, to avoid misunderstandings: “right on the money” is an idiom meaning “exactly right”. I realize it might be best to avoid the use of potentially confusing idioms when writing in a forum with international audience.

DrAnderson42 · April 9, 2012, 1:56pm

Yes, well in my large code, all of the checks take the form of:

#if (__CUDA_ARCH__ >= 200)

And they are are all in global functions (which I thought was the only requirement). Which is why I was confused when running your test.

Topic		Replies	Views
__CUDA_ARCH__ undefined by NVCC on CUDA 3.2 RC CUDA Programming and Performance	15	3803	November 26, 2010
Is __CUDA_ARCH__ broken? CUDA Programming and Performance	3	12816	June 10, 2011
[CUDA 4.0] : __CUDA_ARCH__ undefined in device code CUDA Programming and Performance	9	6789	July 14, 2011
CUDA and nvcc: using the preprocessor to choose between float or double CUDA Programming and Performance	2	4278	January 10, 2012
CUDA architecture Macro CUDA Programming and Performance	2	1915	April 27, 2012
About Interval SDK Example How to compile for SM 2.0 CUDA Programming and Performance	1	5975	December 3, 2010
Fermi Flag CUDA Programming and Performance	8	8371	June 8, 2010
compilation error CUDA Programming and Performance	2	3959	July 16, 2009
Problem with Atomic Functions CUDA Programming and Performance	6	1250	August 7, 2010
Conditional Compilation (__CUDA_ARCH__) CUDA Programming and Performance	5	7952	September 20, 2013

__CUDA_ARCH__ undefined?!

nvcc -gencode=arch=compute_20,code="sm_20,compute_20" test.cu -o test

Related topics