Compiling for the right architecture

Magorath · September 13, 2010, 12:13pm

Hi all !

My question is pretty basic and hopefully pretty simple to answer. How shall I choose the correct values I should pass as a “-arch” and “-code” flag to nvcc ?
Or do you know any good source of information on this topic ?

Thanks in advance

avidday · September 13, 2010, 12:26pm

Section 3.1 in the CUDA 3.1 programming guide discusses this, and the actual architecture values for common GPUs are listed in table A-1.

avidday · September 13, 2010, 12:26pm

Section 3.1 in the CUDA 3.1 programming guide discusses this, and the actual architecture values for common GPUs are listed in table A-1.

MisterAnderson42 · September 13, 2010, 12:27pm

You neglect to mention exactly what it is that you are trying to accomplish.

I build apps for maximum compatibility with both sm_11 and sm_20 targets - see the Fermi compatibility guide for the command line options.

MisterAnderson42 · September 13, 2010, 12:27pm

You neglect to mention exactly what it is that you are trying to accomplish.

I build apps for maximum compatibility with both sm_11 and sm_20 targets - see the Fermi compatibility guide for the command line options.

Magorath · September 13, 2010, 12:36pm

I’m running double precision code which uses some of the newly introduced functions (__hilotoint(), __threadfence_block()) and I would like to take all possible advantage of my GTX 465.

I’m not interested in portability nor backward compatibility.

Magorath · September 13, 2010, 12:36pm

I’m running double precision code which uses some of the newly introduced functions (__hilotoint(), __threadfence_block()) and I would like to take all possible advantage of my GTX 465.

I’m not interested in portability nor backward compatibility.

avidday · September 13, 2010, 12:47pm

then compile with -arch=sm_21

avidday · September 13, 2010, 12:47pm

then compile with -arch=sm_21

Magorath · September 13, 2010, 2:31pm

Well. I tried this and got a 5% performance drop when compared to the same code compiled with sm_20. And even stranger, if I compile with sm_13 I get even better performance (+5%) than sm_20.

How shall I understand this ?

Magorath · September 13, 2010, 2:31pm

Well. I tried this and got a 5% performance drop when compared to the same code compiled with sm_20. And even stranger, if I compile with sm_13 I get even better performance (+5%) than sm_20.

How shall I understand this ?

avidday · September 14, 2010, 7:37am

The default compilation mode for sm_20 and sm_21 is 48k shared memory and 16k L1 cache. For sm_13 code, the GPU is probably running in the reverse configuration (ie. 16k shared memory and 48K L1 cache). It is probable that your code is benefiting from the extra L1 cache. This is discussed in the Fermi Tuning Guide that comes with the 3.1 toolkit, which you have, no doubt, already read.

avidday · September 14, 2010, 7:37am

The default compilation mode for sm_20 and sm_21 is 48k shared memory and 16k L1 cache. For sm_13 code, the GPU is probably running in the reverse configuration (ie. 16k shared memory and 48K L1 cache). It is probable that your code is benefiting from the extra L1 cache. This is discussed in the Fermi Tuning Guide that comes with the 3.1 toolkit, which you have, no doubt, already read.

Magorath · September 14, 2010, 7:51am

Already read and I’m anyway enabling 48K L1 cache in the code through the cudaFuncSetCacheConfig(f,cudaFuncCachePreferL1) call.

Magorath · September 14, 2010, 7:51am

Already read and I’m anyway enabling 48K L1 cache in the code through the cudaFuncSetCacheConfig(f,cudaFuncCachePreferL1) call.

Topic		Replies	Views
Cuda Portability and SharedMem vs Cache CUDA Programming and Performance	9	11621	October 18, 2010
why slower with flags "-arch; sm_20" CUDA Programming and Performance	8	1214	September 9, 2011
Compile time architecture checking? CUDA Programming and Performance	1	1006	January 4, 2011
what does -arch and -code flags do? CUDA Programming and Performance	2	2210	June 26, 2009
Wrong results with -arch=sm_20 on a compute capability 2.0 GPU -arch=sm_13 and -arch=sm_20 does not CUDA Programming and Performance	5	10580	April 16, 2011
Cmake and and Heterogenious GPUs CUDA Programming and Performance	12	12027	September 27, 2010
-arch sm_13 vs -arch sm_20 (sm_20 slower on C2050) CUDA Programming and Performance	21	7211	December 21, 2010
Problem with arch=sm_20 CUDA Programming and Performance	16	4231	March 4, 2011
Conditional Compilation (__CUDA_ARCH__) CUDA Programming and Performance	5	7875	September 20, 2013
gtx 465 performance CUDA Programming and Performance	34	4301	August 18, 2010

Compiling for the right architecture

Related topics