__CUDA_ARCH__ undefined by NVCC on CUDA 3.2 RC

Sarnath · November 26, 2010, 10:16am

Hi There!

–edit–
this issue is seen with CUDA 3.2 Official Release also.
Can moderator please change the title?
–edit–

CUDA_ARCH was defined in CUDA 2.3. When I upgraded to CUDA 3.2, I realize that it is no more defined.
Since I did not test on previous releases, can any1 educate me on this?

Was this a conscious omission (or) an accidental miss? The NVCC manual of 3.2 release still talks about this macro.

I am on Ubuntu Linux 32-bit…

THanks,
Best Regards,
Sarnath

Sarnath · November 26, 2010, 10:16am

Hi There!

–edit–
this issue is seen with CUDA 3.2 Official Release also.
Can moderator please change the title?
–edit–

CUDA_ARCH was defined in CUDA 2.3. When I upgraded to CUDA 3.2, I realize that it is no more defined.
Since I did not test on previous releases, can any1 educate me on this?

Was this a conscious omission (or) an accidental miss? The NVCC manual of 3.2 release still talks about this macro.

I am on Ubuntu Linux 32-bit…

THanks,
Best Regards,
Sarnath

avidday · November 26, 2010, 11:47am

It really is defined on the 3.2rc and 3.2 final toolkits. Take this code:

#include <stdio.h>

#ifndef __CUDA_ARCH__

#warning cuda arch not defined 

#else

#warning everything is normal

#endif

__global__ void helloCUDA(const float f) 

{ 

#if __CUDA_ARCH__ >= 200

	printf("Hello thread %d, f=%f\n", threadIdx.x, f) ; 

#endif

} 

int main() 

{ 

	helloCUDA<<<1, 5>>>(1.2345f); 

	return cudaThreadExit(); 

}

I can safely compile it with the 2.3 toolkit:

avidday@cuda:~$ module load cuda/2.3

avidday@cuda:~$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2009 NVIDIA Corporation

Built on Thu_Jul_30_09:24:36_PDT_2009

Cuda compilation tools, release 2.3, V0.2.1221

avidday@cuda:~$ nvcc -arch=sm_13 cudaprintf.cu

cudaprintf.cu:6:2: warning: #warning everything is normal

avidday@cuda:~$ ./a.out

Under the 3.2 toolkit for sm_13:

avidday@cuda:~$ module switch cuda/2.3 cuda/3.2rc

avidday@cuda:~$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2010 NVIDIA Corporation

Built on Wed_Sep__8_17:12:45_PDT_2010

Cuda compilation tools, release 3.2, V0.2.1221

avidday@cuda:~$ nvcc -arch=sm_13 cudaprintf.cu

cudaprintf.cu:6:2: warning: #warning everything is normal

cudaprintf.cu:4:2: warning: #warning cuda arch not defined

avidday@cuda:~$ ./a.out

And under the 3.2 toolkit for sm_20:

avidday@cuda:~$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2010 NVIDIA Corporation

Built on Wed_Sep__8_17:12:45_PDT_2010

Cuda compilation tools, release 3.2, V0.2.1221

avidday@cuda:~$ nvcc -arch=sm_20 cudaprintf.cu

cudaprintf.cu:6:2: warning: #warning everything is normal

cudaprintf.cu:4:2: warning: #warning cuda arch not defined

avidday@cuda:~$ ./a.out

Hello thread 0, f=1.234500

Hello thread 1, f=1.234500

Hello thread 2, f=1.234500

Hello thread 3, f=1.234500

Hello thread 4, f=1.234500

If you run your nvcc compile statements with the --dryrun , you can see that on the device side compilation, -D__CUDA_ARCH__ is being passed to every gcc call. I don’t see what the problem is.

avidday · November 26, 2010, 11:47am

It really is defined on the 3.2rc and 3.2 final toolkits. Take this code:

#include <stdio.h>

#ifndef __CUDA_ARCH__

#warning cuda arch not defined 

#else

#warning everything is normal

#endif

__global__ void helloCUDA(const float f) 

{ 

#if __CUDA_ARCH__ >= 200

	printf("Hello thread %d, f=%f\n", threadIdx.x, f) ; 

#endif

} 

int main() 

{ 

	helloCUDA<<<1, 5>>>(1.2345f); 

	return cudaThreadExit(); 

}

I can safely compile it with the 2.3 toolkit:

avidday@cuda:~$ module load cuda/2.3

avidday@cuda:~$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2009 NVIDIA Corporation

Built on Thu_Jul_30_09:24:36_PDT_2009

Cuda compilation tools, release 2.3, V0.2.1221

avidday@cuda:~$ nvcc -arch=sm_13 cudaprintf.cu

cudaprintf.cu:6:2: warning: #warning everything is normal

avidday@cuda:~$ ./a.out

Under the 3.2 toolkit for sm_13:

avidday@cuda:~$ module switch cuda/2.3 cuda/3.2rc

avidday@cuda:~$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2010 NVIDIA Corporation

Built on Wed_Sep__8_17:12:45_PDT_2010

Cuda compilation tools, release 3.2, V0.2.1221

avidday@cuda:~$ nvcc -arch=sm_13 cudaprintf.cu

cudaprintf.cu:6:2: warning: #warning everything is normal

cudaprintf.cu:4:2: warning: #warning cuda arch not defined

avidday@cuda:~$ ./a.out

And under the 3.2 toolkit for sm_20:

avidday@cuda:~$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2010 NVIDIA Corporation

Built on Wed_Sep__8_17:12:45_PDT_2010

Cuda compilation tools, release 3.2, V0.2.1221

avidday@cuda:~$ nvcc -arch=sm_20 cudaprintf.cu

cudaprintf.cu:6:2: warning: #warning everything is normal

cudaprintf.cu:4:2: warning: #warning cuda arch not defined

avidday@cuda:~$ ./a.out

Hello thread 0, f=1.234500

Hello thread 1, f=1.234500

Hello thread 2, f=1.234500

Hello thread 3, f=1.234500

Hello thread 4, f=1.234500

If you run your nvcc compile statements with the --dryrun , you can see that on the device side compilation, -D__CUDA_ARCH__ is being passed to every gcc call. I don’t see what the problem is.

Sarnath · November 26, 2010, 11:59am

Hi Avid,

The 2.3 output produces the warning “everything is normal”.

Howver 3.2 and 3.2RC produces 2 warnings – I dont understand what is going on. One for NVCC and one for GCC?

If thats the case, why did it NOT appear with CUDA 2.3

My repor case is simple.

This works with CUDA2.3. But with CUDA 3.2, it does not even compile.

Thanks for your time,

Sarnath · November 26, 2010, 11:59am

Hi Avid,

The 2.3 output produces the warning “everything is normal”.

Howver 3.2 and 3.2RC produces 2 warnings – I dont understand what is going on. One for NVCC and one for GCC?

If thats the case, why did it NOT appear with CUDA 2.3

My repor case is simple.

This works with CUDA2.3. But with CUDA 3.2, it does not even compile.

Thanks for your time,

avidday · November 26, 2010, 12:13pm

I put the answer to that question in bold in my original reply, but maybe you missed it. You get two messages because there are now effectively two preprocessor passes, one for device code, and one for host code (both done by the host compiler). CUDA__ARCH gets propagated through the device compilation trajectory, but not through the host trajectory (after all it is a device code symbol). Which is why my code compiles and yours doesn’t.

avidday · November 26, 2010, 12:13pm

I put the answer to that question in bold in my original reply, but maybe you missed it. You get two messages because there are now effectively two preprocessor passes, one for device code, and one for host code (both done by the host compiler). CUDA__ARCH gets propagated through the device compilation trajectory, but not through the host trajectory (after all it is a device code symbol). Which is why my code compiles and yours doesn’t.

Sarnath · November 26, 2010, 12:25pm

I just referred the manual. It says that the HOST code should NOT depend on this Macro. Oops… Even the 2.2 manual says that.
It was my mistake to depend on it.

In any case, I wonder how I could

Allocate some host buffer for a kernel that works for a specific architecture.
An even simpler case would be to define REAL as “double” or “float” depending on CUDA_ARCH that is being compiled for.

Its all confusng! Anyway,Thanks for your time,

Sarnath · November 26, 2010, 12:25pm

I just referred the manual. It says that the HOST code should NOT depend on this Macro. Oops… Even the 2.2 manual says that.
It was my mistake to depend on it.

In any case, I wonder how I could

Allocate some host buffer for a kernel that works for a specific architecture.
An even simpler case would be to define REAL as “double” or “float” depending on CUDA_ARCH that is being compiled for.

Its all confusng! Anyway,Thanks for your time,

tera · November 26, 2010, 12:37pm

To add to avidday’s answer:
Device code can be compiled for multiple architectures with the correct code selected by the host code at runtime. Because of that it is impossible to know the device architecture at compile time of the host code.
That’s why CUDA__ARCH is undefined in host code (and indeed is the recommended way to distinguish between host and device code).

If you want to do different things from host code depending on device architecture, you’ll have to switch at runtime using cudaGetDeviceProperties().

tera · November 26, 2010, 12:37pm

To add to avidday’s answer:
Device code can be compiled for multiple architectures with the correct code selected by the host code at runtime. Because of that it is impossible to know the device architecture at compile time of the host code.
That’s why CUDA__ARCH is undefined in host code (and indeed is the recommended way to distinguish between host and device code).

If you want to do different things from host code depending on device architecture, you’ll have to switch at runtime using cudaGetDeviceProperties().

avidday · November 26, 2010, 12:38pm

Query the architecture major and minor revisions via the API, like what deviceQuery does and use the results to determine what host code does at runtime.

You can do that using CUDA_ARCH on the device side, which is where those decisions have to be made at compile time anyway. You can also use templating for that sort of thing, which is my preferred method anyway. The code selection still happens at compile time, and you can emit a warning or #error if the device code detects a conflict between templating and architecture. nvcc is now also smart enough to emit its own warnings about demotion from double to single when the target architecture doesn’t support it.

avidday · November 26, 2010, 12:38pm

Query the architecture major and minor revisions via the API, like what deviceQuery does and use the results to determine what host code does at runtime.

You can do that using CUDA_ARCH on the device side, which is where those decisions have to be made at compile time anyway. You can also use templating for that sort of thing, which is my preferred method anyway. The code selection still happens at compile time, and you can emit a warning or #error if the device code detects a conflict between templating and architecture. nvcc is now also smart enough to emit its own warnings about demotion from double to single when the target architecture doesn’t support it.

Sarnath · November 26, 2010, 12:39pm

Vow! That makes a lotta sense! Thanks!

Sarnath · November 26, 2010, 12:39pm

Vow! That makes a lotta sense! Thanks!

Topic		Replies	Views
__CUDA_ARCH__ undefined?! CUDA Programming and Performance	10	20865	April 9, 2012
Is __CUDA_ARCH__ broken? CUDA Programming and Performance	3	12836	June 10, 2011
CUDA and nvcc: using the preprocessor to choose between float or double CUDA Programming and Performance	2	4295	January 10, 2012
installing CUDA CUDA Programming and Performance	3	9894	January 7, 2010
CUDA architecture Macro CUDA Programming and Performance	2	1930	April 27, 2012
compilation error CUDA Programming and Performance	2	3981	July 16, 2009
[CUDA 4.0] : __CUDA_ARCH__ undefined in device code CUDA Programming and Performance	9	6825	July 14, 2011
About Interval SDK Example How to compile for SM 2.0 CUDA Programming and Performance	1	5993	December 3, 2010
CUDA 3.2.16 under Ubuntu 10.10 CUDA Programming and Performance	2	670	November 28, 2010
CUDA Toolkit 3.2 release candidate available to registered developers CUDA Programming and Performance	68	63398	December 3, 2010

__CUDA_ARCH__ undefined by NVCC on CUDA 3.2 RC

Related topics