(I had originally posted this in the NSIGHT forum, but I did not receive a response. I hope that this forum will be a better fit for this question. Thanks for looking!)
The Fermi tuning guide suggests that for some problems it can be useful to turn off the L1 cache, so I am interested in trying it out just to see how it affects my code. The CUDA Programming Guide says that this is to be done at compile time using the options -Xptxas -dlcm=cg with nvcc. I am attempting to compile a program to run on a Tesla C2050 using Visual Studio 2010, NVIDIA NSIGHT 1.5, and Windows SDK 7.0. I’m afraid that I am missing a piece to this puzzle, and I am hoping a kind soul will point out my mistake.
The code initially compiles and runs. When I looked in the Project Properties at the CUDA C/C++ build rule (for CUDA 3.2, which I believe came with NSIGHT), I did not see an option for setting the cache. So, I added --ptxas-options=-dlcm=cg to the “Additional Options” under the “Command Line” menu. The program now appears to compile, but it fails at the link stage.
I used the matrix multiply sample from the SDK as the example code. Disclaimer: I created a new project from the existing code, so I am not sure that all of the project settings are the same as the default. For this case, the output is:
1>------ Build started: Project: MatrixMul_test, Configuration: Debug x64 ------
1>Build started 11/3/2010 1:25:31 PM.
1>InitializeBuildStatus:
1> Creating "x64\Debug\MatrixMul_test.unsuccessfulbuild" because "AlwaysCreate" was specified.
1>AddCudaCompileDeps:
1>Skipping target "AddCudaCompileDeps" because all output files are up-to-date with respect to the input files.
1>CudaBuild:
1> Compiling CUDA source file matrixMul.cu...
1>
1> C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2008 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\x86_amd64" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\shared\inc" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\C\common\inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include" -G0 --keep-dir "x64\Debug\" -maxrregcount=32 --machine 64 --compile --ptxas-options=-dlcm=cg -D_NEXUS_DEBUG -g -Xcompiler "/EHsc /nologo /Od /Zi /MTd " -o "x64\Debug\matrixMul.obj" "C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\matrixMul.cu"
1> matrixMul.cu
1> tmpxft_00000a3c_00000000-0_matrixMul.cudafe1.gpu
1> tmpxft_00000a3c_00000000-5_matrixMul.cudafe2.gpu
1> matrixMul.cu
1> tmpxft_00000a3c_00000000-0_matrixMul.cudafe1.cpp
1> tmpxft_00000a3c_00000000-11_matrixMul.ii
1> Note: including windows.h
1> Note: including math.h
1> Note: including assert.h
1> Note: including rendercheckGL.h
1> Note: including <vector>
1> Note: including <map>
1> Note: including <string>
1> Note: including GL/glew.h
1> Note: including GL/glut.h
1> Note: including lib: glut32.lib
1>
1> Deleting file "tmpxft_00000a3c_00000000-6_matrixMul.cpp3.o".
1>ClCompile:
1> matrixMul_gold.cpp
1>ManifestResourceCompile:
1> Microsoft (R) Windows (R) Resource Compiler Version 6.1.7600.16385
1>
1> Copyright (C) Microsoft Corporation. All rights reserved.
1>
1>
1>LINK : warning LNK4044: unrecognized option '/-ptxas-options=-dlcm=cg'; ignored
1>matrixMul.obj : error LNK2019: unresolved external symbol computeGold referenced in function "void __cdecl runTest(int,char * *)" (?runTest@@YAXHPEAPEAD@Z)
1>C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\x64\Debug\MatrixMul_test.exe : fatal error LNK1120: 1 unresolved externals
1>
1>Build FAILED.
1>
1>Time Elapsed 00:00:07.19
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
So, the --ptxas option is on the line for the nvcc compiler, and it seems to be compiling. However it has trouble finding a subroutine in the C++ file (computeGold).
Although this may be a red herring, I then decided to see what would happen if I changed the extension and item type of the the matrixMul_gold.cpp file to a CUDA C file. In this case, the linker appears to find the unresolved external symbol that was in the cpp file, but is now having trouble with the linked libraries (I think):
1>------ Build started: Project: MatrixMul_test, Configuration: Debug x64 ------
1>Build started 11/3/2010 1:43:15 PM.
1>InitializeBuildStatus:
1> Touching "x64\Debug\MatrixMul_test.unsuccessfulbuild".
1>AddCudaCompileDeps:
1>Skipping target "AddCudaCompileDeps" because all output files are up-to-date with respect to the input files.
1>CudaBuild:
1> Compiling CUDA source file matrixMul_gold.cu...
1> Compiling CUDA source file matrixMul.cu...
1>
1> C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2008 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\x86_amd64" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\shared\inc" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\C\common\inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include" -G0 --keep-dir "x64\Debug\" -maxrregcount=32 --machine 64 --compile --ptxas-options=-dlcm=cg -D_NEXUS_DEBUG -g -Xcompiler "/EHsc /nologo /Od /Zi /MTd " -o "x64\Debug\matrixMul_gold.obj" "C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\matrixMul_gold.cu"
1> matrixMul_gold.cu
1> tmpxft_00000420_00000000-0_matrixMul_gold.cudafe1.gpu
1> tmpxft_00000420_00000000-5_matrixMul_gold.cudafe2.gpu
1> matrixMul_gold.cu
1> tmpxft_00000420_00000000-0_matrixMul_gold.cudafe1.cpp
1> tmpxft_00000420_00000000-11_matrixMul_gold.ii
1>
1> C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2008 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\x86_amd64" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\shared\inc" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\C\common\inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include" -G0 --keep-dir "x64\Debug\" -maxrregcount=32 --machine 64 --compile --ptxas-options=-dlcm=cg -D_NEXUS_DEBUG -g -Xcompiler "/EHsc /nologo /Od /Zi /MTd " -o "x64\Debug\matrixMul.obj" "C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\matrixMul.cu"
1> matrixMul.cu
1> tmpxft_00000b58_00000000-0_matrixMul.cudafe1.gpu
1> tmpxft_00000b58_00000000-5_matrixMul.cudafe2.gpu
1> matrixMul.cu
1> tmpxft_00000b58_00000000-0_matrixMul.cudafe1.cpp
1> tmpxft_00000b58_00000000-11_matrixMul.ii
1> Note: including windows.h
1> Note: including math.h
1> Note: including assert.h
1> Note: including rendercheckGL.h
1> Note: including <vector>
1> Note: including <map>
1> Note: including <string>
1> Note: including GL/glew.h
1> Note: including GL/glut.h
1> Note: including lib: glut32.lib
1>
1> Deleting file "tmpxft_00000420_00000000-6_matrixMul_gold.cpp3.o".
1> Deleting file "tmpxft_00000b58_00000000-6_matrixMul.cpp3.o".
1>ManifestResourceCompile:
1> All outputs are up-to-date.
1>LINK : warning LNK4044: unrecognized option '/-ptxas-options=-dlcm=cg'; ignored
1>shrUtils64D.lib(cmd_arg_reader.obj) : warning LNK4204: 'C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\x64\Debug\vc90.pdb' is missing debugging information for referencing module; linking object as if no debug info
1>shrUtils64D.lib(shrUtils.obj) : warning LNK4204: 'C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\x64\Debug\vc90.pdb' is missing debugging information for referencing module; linking object as if no debug info
1>LINK : error LNK2001: unresolved external symbol mainCRTStartup
1>C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\x64\Debug\MatrixMul_test.exe : fatal error LNK1120: 1 unresolved externals
1>
1>Build FAILED.
1>
1>Time Elapsed 00:00:08.33
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
A clever person might now suggest that I give up on this project and try adding the compiler option to the sample file that is included with the NSIGHT file. I tried. I copied the Matrix Multiply example folder to a new directory, adjusted the location of check.h, and hit “Build” again. I even tried changing the architecture from sm_10 to sm_20. No luck and a similar error message. (For some reason (file permissions?) the -ptxas-options=-dlcm=cg didn’t appear on the nvcc.exe line until I had copied the folder to a new location).
So, now that I have shared what didn’t work for me, can someone share with me what I was actually supposed to do? I appreciate your guidance, oh Gurus of the Aether!
Thanks,