I just switched from OpenCL to CUDA 4.0 and am in the process of refactoring my kernels.
Could explain what the following lines out of my build log mean?
1> ptxas info : Function properties for _Z8fugacityfffffPKfS0_Pf
1> 40 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads
1> ptxas info : Function properties for _Z4fabsf
1> 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
1> ptxas info : Used 27 registers, 88 bytes cmem[0]
I understand register count, lmem, cmem, and smem. However, I was just wondering if I should worry about the other numbers there (especially the “spills”). If the spills correlate to global memory access, would I use the same optimization techniques I’ve used to eliminate register spills in my older kernels?
My whole build log:
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_21,compute_20\" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2010 -ccbin "d:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\include" --opencc-options -LIST:source=on -G0 --keep-dir "Debug" -maxrregcount=0 --ptxas-options=-v --machine 32 --compile -D_NEXUS_DEBUG -g -Xcompiler "/EHsc /nologo /Ox /Zi /MDd " -o "Debug\flashKernel.cu.obj" "D:\Will\Documents\My Dropbox\Research\flashKernel.cu"
1> flashKernel.cu
1> tmpxft_00001788_00000000-3_flashKernel.compute_20.cudafe1.gpu
1> tmpxft_00001788_00000000-7_flashKernel.compute_20.cudafe2.gpu
1> flashKernel.cu
1> tmpxft_00001788_00000000-0_flashKernel.compute_10.cudafe1.gpu
1> tmpxft_00001788_00000000-11_flashKernel.compute_10.cudafe2.gpu
1> flashKernel.cu
1> ptxas info : Compiling entry function '_Z6flash4PKfS0_S0_S0_S0_S0_S0_S0_S0_S0_PfS1_S1_S1_' for 'sm_21'
1> ptxas info : Function properties for _Z6flash4PKfS0_S0_S0_S0_S0_S0_S0_S0_S0_PfS1_S1_S1_
1> 96 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
1> ptxas info : Function properties for _Z8zfff
1> 16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads
1> ptxas info : Function properties for _Z13PRPKfS0_S0_S0_ffPf
1> 64 bytes stack frame, 48 bytes spill stores, 48 bytes spill loads
1> ptxas info : Function properties for _Z8fugacityfffffPKfS0_Pf
1> 40 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads
1> ptxas info : Function properties for _Z4fabsf
1> 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
1> ptxas info : Used 27 registers, 88 bytes cmem[0]
1> flashKernel.cu
1> ptxas info : Compiling entry function '_Z6flash4PKfS0_S0_S0_S0_S0_S0_S0_S0_S0_PfS1_S1_S1_' for 'sm_10'
1> ptxas info : Used 21 registers, 112+0 bytes lmem, 56+16 bytes smem, 8 bytes cmem[1]
1> tmpxft_00001788_00000000-3_flashKernel.compute_20.cudafe1.cpp
1> tmpxft_00001788_00000000-20_flashKernel.compute_20.ii
1>ManifestResourceCompile:
1> All outputs are up-to-date.