Erroneous lmem statistics displayed for sm_20?

A kernel compiled with the option [font=“Courier New”]–ptxas-options=-v[/font] seems to be displaying erroneous [font=“Courier New”]lmem[/font] (local memory) statistics when [font=“Courier New”]sm_20[/font] GPU architecture is specified. The same gives meaningful lmem statistics with [font=“Courier New”]sm_10[/font], [font=“Courier New”]sm_11[/font], [font=“Courier New”]sm_12[/font] and [font=“Courier New”]sm_13[/font].

Can someone clarify if the sm_20 lmem statistics need to be read differently or they are plain wrong?

Here is the kernel:

__global__ void fooKernel( int* dResult )


	const int num = 1000;

	int val[num]; 

	for ( int i = 0; i < num; ++i )

 	val[i] = i * i; 

	int result = 0; 

	for ( int i = 0; i < num; ++i )

 	result += val[i]; 

	*dResult = result;




[font=“Courier New”]–ptxas-options=-v[/font] and [font=“Courier New”]sm_20[/font] report:

1>ptxas info	: Compiling entry function '_Z9fooKernelPi' for 'sm_20'

1>ptxas info	: Used 5 registers, 4+0 bytes lmem, 36 bytes cmem[0]

[font=“Courier New”]–ptxas-options=-v[/font] and [font=“Courier New”]sm_10 / sm_11 / sm_12 / sm_13[/font] report:

1>ptxas info	: Compiling entry function '_Z9fooKernelPi' for 'sm_10'

1>ptxas info	: Used 3 registers, 4000+0 bytes lmem, 4+16 bytes smem, 4 bytes cmem[1]

sm_20 reports a lmem of [font=“Courier New”]4 bytes[/font], which is simply not possible if you see the [font=“Courier New”]4x1000 byte[/font] array being used in the kernel. The older GPU architectures report the correct [font=“Courier New”]4000 byte[/font] lmem statistic.

This was tried with CUDA 3.2. I have referred to the Printing Code Generation Statistics section of the NVCC manual (v3.2), but it does not help explain this anomaly.