Questions about CUDA 4.0 Build Logs Can someone define "stack frame", "spill stores"

I just switched from OpenCL to CUDA 4.0 and am in the process of refactoring my kernels.

Could explain what the following lines out of my build log mean?

1>  ptxas info    : Function properties for _Z8fugacityfffffPKfS0_Pf

1>      40 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads

1>  ptxas info    : Function properties for _Z4fabsf

1>      0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

1>  ptxas info    : Used 27 registers, 88 bytes cmem[0]

I understand register count, lmem, cmem, and smem. However, I was just wondering if I should worry about the other numbers there (especially the “spills”). If the spills correlate to global memory access, would I use the same optimization techniques I’ve used to eliminate register spills in my older kernels?

My whole build log:

"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_21,compute_20\" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2010 -ccbin "d:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\include" --opencc-options -LIST:source=on -G0  --keep-dir "Debug" -maxrregcount=0 --ptxas-options=-v --machine 32 --compile  -D_NEXUS_DEBUG -g    -Xcompiler "/EHsc /nologo /Ox /Zi  /MDd " -o "Debug\flashKernel.cu.obj" "D:\Will\Documents\My Dropbox\Research\flashKernel.cu" 

1>  flashKernel.cu

1>  tmpxft_00001788_00000000-3_flashKernel.compute_20.cudafe1.gpu

1>  tmpxft_00001788_00000000-7_flashKernel.compute_20.cudafe2.gpu

1>  flashKernel.cu

1>  tmpxft_00001788_00000000-0_flashKernel.compute_10.cudafe1.gpu

1>  tmpxft_00001788_00000000-11_flashKernel.compute_10.cudafe2.gpu

1>  flashKernel.cu

1>  ptxas info    : Compiling entry function '_Z6flash4PKfS0_S0_S0_S0_S0_S0_S0_S0_S0_PfS1_S1_S1_' for 'sm_21'

1>  ptxas info    : Function properties for _Z6flash4PKfS0_S0_S0_S0_S0_S0_S0_S0_S0_PfS1_S1_S1_

1>      96 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

1>  ptxas info    : Function properties for _Z8zfff

1>      16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads

1>  ptxas info    : Function properties for _Z13PRPKfS0_S0_S0_ffPf

1>      64 bytes stack frame, 48 bytes spill stores, 48 bytes spill loads

1>  ptxas info    : Function properties for _Z8fugacityfffffPKfS0_Pf

1>      40 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads

1>  ptxas info    : Function properties for _Z4fabsf

1>      0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

1>  ptxas info    : Used 27 registers, 88 bytes cmem[0]

1>  flashKernel.cu

1>  ptxas info    : Compiling entry function '_Z6flash4PKfS0_S0_S0_S0_S0_S0_S0_S0_S0_PfS1_S1_S1_' for 'sm_10'

1>  ptxas info    : Used 21 registers, 112+0 bytes lmem, 56+16 bytes smem, 8 bytes cmem[1]

1>  tmpxft_00001788_00000000-3_flashKernel.compute_20.cudafe1.cpp

1>  tmpxft_00001788_00000000-20_flashKernel.compute_20.ii

1>ManifestResourceCompile:

1>  All outputs are up-to-date.

Spilled registers are stored in local memory(same as global memory, but is accessible only to the owner thread. Each thread has some of its own local memory) as a stack. [font=monospace]40[/font][font=monospace] bytes stack frame [/font]means the stack takes 40 bytes at maximum. As for where such register spills come from, you’ll have to look at your own code. Use cuobjdump to see what happens.