__CUDA_ARCH__ undefined by NVCC on CUDA 3.2 RC

Hi There!

–edit–
this issue is seen with CUDA 3.2 Official Release also.
Can moderator please change the title?
–edit–

CUDA_ARCH was defined in CUDA 2.3. When I upgraded to CUDA 3.2, I realize that it is no more defined.
Since I did not test on previous releases, can any1 educate me on this?

Was this a conscious omission (or) an accidental miss? The NVCC manual of 3.2 release still talks about this macro.

I am on Ubuntu Linux 32-bit…

THanks,
Best Regards,
Sarnath

Hi There!

–edit–
this issue is seen with CUDA 3.2 Official Release also.
Can moderator please change the title?
–edit–

CUDA_ARCH was defined in CUDA 2.3. When I upgraded to CUDA 3.2, I realize that it is no more defined.
Since I did not test on previous releases, can any1 educate me on this?

Was this a conscious omission (or) an accidental miss? The NVCC manual of 3.2 release still talks about this macro.

I am on Ubuntu Linux 32-bit…

THanks,
Best Regards,
Sarnath

It really is defined on the 3.2rc and 3.2 final toolkits. Take this code:

#include <stdio.h>

#ifndef __CUDA_ARCH__

#warning cuda arch not defined 

#else

#warning everything is normal

#endif

__global__ void helloCUDA(const float f) 

{ 

#if __CUDA_ARCH__ >= 200

	printf("Hello thread %d, f=%f\n", threadIdx.x, f) ; 

#endif

} 

int main() 

{ 

	helloCUDA<<<1, 5>>>(1.2345f); 

	return cudaThreadExit(); 

}

I can safely compile it with the 2.3 toolkit:

avidday@cuda:~$ module load cuda/2.3

avidday@cuda:~$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2009 NVIDIA Corporation

Built on Thu_Jul_30_09:24:36_PDT_2009

Cuda compilation tools, release 2.3, V0.2.1221

avidday@cuda:~$ nvcc -arch=sm_13 cudaprintf.cu

cudaprintf.cu:6:2: warning: #warning everything is normal

avidday@cuda:~$ ./a.out

Under the 3.2 toolkit for sm_13:

avidday@cuda:~$ module switch cuda/2.3 cuda/3.2rc

avidday@cuda:~$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2010 NVIDIA Corporation

Built on Wed_Sep__8_17:12:45_PDT_2010

Cuda compilation tools, release 3.2, V0.2.1221

avidday@cuda:~$ nvcc -arch=sm_13 cudaprintf.cu

cudaprintf.cu:6:2: warning: #warning everything is normal

cudaprintf.cu:4:2: warning: #warning cuda arch not defined

avidday@cuda:~$ ./a.out

And under the 3.2 toolkit for sm_20:

avidday@cuda:~$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2010 NVIDIA Corporation

Built on Wed_Sep__8_17:12:45_PDT_2010

Cuda compilation tools, release 3.2, V0.2.1221

avidday@cuda:~$ nvcc -arch=sm_20 cudaprintf.cu

cudaprintf.cu:6:2: warning: #warning everything is normal

cudaprintf.cu:4:2: warning: #warning cuda arch not defined

avidday@cuda:~$ ./a.out

Hello thread 0, f=1.234500

Hello thread 1, f=1.234500

Hello thread 2, f=1.234500

Hello thread 3, f=1.234500

Hello thread 4, f=1.234500

If you run your nvcc compile statements with the --dryrun , you can see that on the device side compilation, -D__CUDA_ARCH__ is being passed to every gcc call. I don’t see what the problem is.

It really is defined on the 3.2rc and 3.2 final toolkits. Take this code:

#include <stdio.h>

#ifndef __CUDA_ARCH__

#warning cuda arch not defined 

#else

#warning everything is normal

#endif

__global__ void helloCUDA(const float f) 

{ 

#if __CUDA_ARCH__ >= 200

	printf("Hello thread %d, f=%f\n", threadIdx.x, f) ; 

#endif

} 

int main() 

{ 

	helloCUDA<<<1, 5>>>(1.2345f); 

	return cudaThreadExit(); 

}

I can safely compile it with the 2.3 toolkit:

avidday@cuda:~$ module load cuda/2.3

avidday@cuda:~$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2009 NVIDIA Corporation

Built on Thu_Jul_30_09:24:36_PDT_2009

Cuda compilation tools, release 2.3, V0.2.1221

avidday@cuda:~$ nvcc -arch=sm_13 cudaprintf.cu

cudaprintf.cu:6:2: warning: #warning everything is normal

avidday@cuda:~$ ./a.out

Under the 3.2 toolkit for sm_13:

avidday@cuda:~$ module switch cuda/2.3 cuda/3.2rc

avidday@cuda:~$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2010 NVIDIA Corporation

Built on Wed_Sep__8_17:12:45_PDT_2010

Cuda compilation tools, release 3.2, V0.2.1221

avidday@cuda:~$ nvcc -arch=sm_13 cudaprintf.cu

cudaprintf.cu:6:2: warning: #warning everything is normal

cudaprintf.cu:4:2: warning: #warning cuda arch not defined

avidday@cuda:~$ ./a.out

And under the 3.2 toolkit for sm_20:

avidday@cuda:~$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2010 NVIDIA Corporation

Built on Wed_Sep__8_17:12:45_PDT_2010

Cuda compilation tools, release 3.2, V0.2.1221

avidday@cuda:~$ nvcc -arch=sm_20 cudaprintf.cu

cudaprintf.cu:6:2: warning: #warning everything is normal

cudaprintf.cu:4:2: warning: #warning cuda arch not defined

avidday@cuda:~$ ./a.out

Hello thread 0, f=1.234500

Hello thread 1, f=1.234500

Hello thread 2, f=1.234500

Hello thread 3, f=1.234500

Hello thread 4, f=1.234500

If you run your nvcc compile statements with the --dryrun , you can see that on the device side compilation, -D__CUDA_ARCH__ is being passed to every gcc call. I don’t see what the problem is.

Hi Avid,

The 2.3 output produces the warning “everything is normal”.

Howver 3.2 and 3.2RC produces 2 warnings – I dont understand what is going on. One for NVCC and one for GCC?

If thats the case, why did it NOT appear with CUDA 2.3

My repor case is simple.

This works with CUDA2.3. But with CUDA 3.2, it does not even compile.

Thanks for your time,

Hi Avid,

The 2.3 output produces the warning “everything is normal”.

Howver 3.2 and 3.2RC produces 2 warnings – I dont understand what is going on. One for NVCC and one for GCC?

If thats the case, why did it NOT appear with CUDA 2.3

My repor case is simple.

This works with CUDA2.3. But with CUDA 3.2, it does not even compile.

Thanks for your time,

I put the answer to that question in bold in my original reply, but maybe you missed it. You get two messages because there are now effectively two preprocessor passes, one for device code, and one for host code (both done by the host compiler). CUDA__ARCH gets propagated through the device compilation trajectory, but not through the host trajectory (after all it is a device code symbol). Which is why my code compiles and yours doesn’t.

I put the answer to that question in bold in my original reply, but maybe you missed it. You get two messages because there are now effectively two preprocessor passes, one for device code, and one for host code (both done by the host compiler). CUDA__ARCH gets propagated through the device compilation trajectory, but not through the host trajectory (after all it is a device code symbol). Which is why my code compiles and yours doesn’t.

I just referred the manual. It says that the HOST code should NOT depend on this Macro. Oops… Even the 2.2 manual says that.
It was my mistake to depend on it.

In any case, I wonder how I could

  1. Allocate some host buffer for a kernel that works for a specific architecture.
  2. An even simpler case would be to define REAL as “double” or “float” depending on CUDA_ARCH that is being compiled for.

Its all confusng! Anyway,Thanks for your time,

I just referred the manual. It says that the HOST code should NOT depend on this Macro. Oops… Even the 2.2 manual says that.
It was my mistake to depend on it.

In any case, I wonder how I could

  1. Allocate some host buffer for a kernel that works for a specific architecture.
  2. An even simpler case would be to define REAL as “double” or “float” depending on CUDA_ARCH that is being compiled for.

Its all confusng! Anyway,Thanks for your time,

To add to avidday’s answer:
Device code can be compiled for multiple architectures with the correct code selected by the host code at runtime. Because of that it is impossible to know the device architecture at compile time of the host code.
That’s why CUDA__ARCH is undefined in host code (and indeed is the recommended way to distinguish between host and device code).

If you want to do different things from host code depending on device architecture, you’ll have to switch at runtime using cudaGetDeviceProperties().

To add to avidday’s answer:
Device code can be compiled for multiple architectures with the correct code selected by the host code at runtime. Because of that it is impossible to know the device architecture at compile time of the host code.
That’s why CUDA__ARCH is undefined in host code (and indeed is the recommended way to distinguish between host and device code).

If you want to do different things from host code depending on device architecture, you’ll have to switch at runtime using cudaGetDeviceProperties().

Query the architecture major and minor revisions via the API, like what deviceQuery does and use the results to determine what host code does at runtime.

You can do that using CUDA_ARCH on the device side, which is where those decisions have to be made at compile time anyway. You can also use templating for that sort of thing, which is my preferred method anyway. The code selection still happens at compile time, and you can emit a #warning or #error if the device code detects a conflict between templating and architecture. nvcc is now also smart enough to emit its own warnings about demotion from double to single when the target architecture doesn’t support it.

Query the architecture major and minor revisions via the API, like what deviceQuery does and use the results to determine what host code does at runtime.

You can do that using CUDA_ARCH on the device side, which is where those decisions have to be made at compile time anyway. You can also use templating for that sort of thing, which is my preferred method anyway. The code selection still happens at compile time, and you can emit a #warning or #error if the device code detects a conflict between templating and architecture. nvcc is now also smart enough to emit its own warnings about demotion from double to single when the target architecture doesn’t support it.

Vow! That makes a lotta sense! Thanks!

Vow! That makes a lotta sense! Thanks!