Help me understand "-Xptxas -dlcm=cg" (take 2)

(I had originally posted this in the NSIGHT forum, but I did not receive a response. I hope that this forum will be a better fit for this question. Thanks for looking!)

The Fermi tuning guide suggests that for some problems it can be useful to turn off the L1 cache, so I am interested in trying it out just to see how it affects my code. The CUDA Programming Guide says that this is to be done at compile time using the options -Xptxas -dlcm=cg with nvcc. I am attempting to compile a program to run on a Tesla C2050 using Visual Studio 2010, NVIDIA NSIGHT 1.5, and Windows SDK 7.0. I’m afraid that I am missing a piece to this puzzle, and I am hoping a kind soul will point out my mistake.

The code initially compiles and runs. When I looked in the Project Properties at the CUDA C/C++ build rule (for CUDA 3.2, which I believe came with NSIGHT), I did not see an option for setting the cache. So, I added --ptxas-options=-dlcm=cg to the “Additional Options” under the “Command Line” menu. The program now appears to compile, but it fails at the link stage.

I used the matrix multiply sample from the SDK as the example code. Disclaimer: I created a new project from the existing code, so I am not sure that all of the project settings are the same as the default. For this case, the output is:

1>------ Build started: Project: MatrixMul_test, Configuration: Debug x64 ------

1>Build started 11/3/2010 1:25:31 PM.

1>InitializeBuildStatus:

1>  Creating "x64\Debug\MatrixMul_test.unsuccessfulbuild" because "AlwaysCreate" was specified.

1>AddCudaCompileDeps:

1>Skipping target "AddCudaCompileDeps" because all output files are up-to-date with respect to the input files.

1>CudaBuild:

1>  Compiling CUDA source file matrixMul.cu...

1>  

1>  C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2008 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\x86_amd64" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\shared\inc" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\C\common\inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include"  -G0  --keep-dir "x64\Debug\" -maxrregcount=32  --machine 64 --compile --ptxas-options=-dlcm=cg  -D_NEXUS_DEBUG -g    -Xcompiler "/EHsc /nologo /Od /Zi  /MTd " -o "x64\Debug\matrixMul.obj" "C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\matrixMul.cu" 

1>  matrixMul.cu

1>  tmpxft_00000a3c_00000000-0_matrixMul.cudafe1.gpu

1>  tmpxft_00000a3c_00000000-5_matrixMul.cudafe2.gpu

1>  matrixMul.cu

1>  tmpxft_00000a3c_00000000-0_matrixMul.cudafe1.cpp

1>  tmpxft_00000a3c_00000000-11_matrixMul.ii

1>  Note: including windows.h

1>  Note: including math.h

1>  Note: including assert.h

1>  Note: including rendercheckGL.h

1>  Note: including <vector>

1>  Note: including <map>

1>  Note: including <string>

1>  Note: including GL/glew.h

1>  Note: including GL/glut.h

1>  Note: including lib: glut32.lib

1>  

1>  Deleting file "tmpxft_00000a3c_00000000-6_matrixMul.cpp3.o".

1>ClCompile:

1>  matrixMul_gold.cpp

1>ManifestResourceCompile:

1>  Microsoft (R) Windows (R) Resource Compiler Version 6.1.7600.16385

1>  

1>  Copyright (C) Microsoft Corporation.  All rights reserved.

1>  

1>  

1>LINK : warning LNK4044: unrecognized option '/-ptxas-options=-dlcm=cg'; ignored

1>matrixMul.obj : error LNK2019: unresolved external symbol computeGold referenced in function "void __cdecl runTest(int,char * *)" (?runTest@@YAXHPEAPEAD@Z)

1>C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\x64\Debug\MatrixMul_test.exe : fatal error LNK1120: 1 unresolved externals

1>

1>Build FAILED.

1>

1>Time Elapsed 00:00:07.19

========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

So, the --ptxas option is on the line for the nvcc compiler, and it seems to be compiling. However it has trouble finding a subroutine in the C++ file (computeGold).

Although this may be a red herring, I then decided to see what would happen if I changed the extension and item type of the the matrixMul_gold.cpp file to a CUDA C file. In this case, the linker appears to find the unresolved external symbol that was in the cpp file, but is now having trouble with the linked libraries (I think):

1>------ Build started: Project: MatrixMul_test, Configuration: Debug x64 ------

1>Build started 11/3/2010 1:43:15 PM.

1>InitializeBuildStatus:

1>  Touching "x64\Debug\MatrixMul_test.unsuccessfulbuild".

1>AddCudaCompileDeps:

1>Skipping target "AddCudaCompileDeps" because all output files are up-to-date with respect to the input files.

1>CudaBuild:

1>  Compiling CUDA source file matrixMul_gold.cu...

1>  Compiling CUDA source file matrixMul.cu...

1>  

1>  C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2008 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\x86_amd64" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\shared\inc" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\C\common\inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include"  -G0  --keep-dir "x64\Debug\" -maxrregcount=32  --machine 64 --compile --ptxas-options=-dlcm=cg  -D_NEXUS_DEBUG -g    -Xcompiler "/EHsc /nologo /Od /Zi  /MTd " -o "x64\Debug\matrixMul_gold.obj" "C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\matrixMul_gold.cu" 

1>  matrixMul_gold.cu

1>  tmpxft_00000420_00000000-0_matrixMul_gold.cudafe1.gpu

1>  tmpxft_00000420_00000000-5_matrixMul_gold.cudafe2.gpu

1>  matrixMul_gold.cu

1>  tmpxft_00000420_00000000-0_matrixMul_gold.cudafe1.cpp

1>  tmpxft_00000420_00000000-11_matrixMul_gold.ii

1>  

1>  C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2008 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\x86_amd64" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\shared\inc" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\C\common\inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include"  -G0  --keep-dir "x64\Debug\" -maxrregcount=32  --machine 64 --compile --ptxas-options=-dlcm=cg  -D_NEXUS_DEBUG -g    -Xcompiler "/EHsc /nologo /Od /Zi  /MTd " -o "x64\Debug\matrixMul.obj" "C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\matrixMul.cu" 

1>  matrixMul.cu

1>  tmpxft_00000b58_00000000-0_matrixMul.cudafe1.gpu

1>  tmpxft_00000b58_00000000-5_matrixMul.cudafe2.gpu

1>  matrixMul.cu

1>  tmpxft_00000b58_00000000-0_matrixMul.cudafe1.cpp

1>  tmpxft_00000b58_00000000-11_matrixMul.ii

1>  Note: including windows.h

1>  Note: including math.h

1>  Note: including assert.h

1>  Note: including rendercheckGL.h

1>  Note: including <vector>

1>  Note: including <map>

1>  Note: including <string>

1>  Note: including GL/glew.h

1>  Note: including GL/glut.h

1>  Note: including lib: glut32.lib

1>  

1>  Deleting file "tmpxft_00000420_00000000-6_matrixMul_gold.cpp3.o".

1>  Deleting file "tmpxft_00000b58_00000000-6_matrixMul.cpp3.o".

1>ManifestResourceCompile:

1>  All outputs are up-to-date.

1>LINK : warning LNK4044: unrecognized option '/-ptxas-options=-dlcm=cg'; ignored

1>shrUtils64D.lib(cmd_arg_reader.obj) : warning LNK4204: 'C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\x64\Debug\vc90.pdb' is missing debugging information for referencing module; linking object as if no debug info

1>shrUtils64D.lib(shrUtils.obj) : warning LNK4204: 'C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\x64\Debug\vc90.pdb' is missing debugging information for referencing module; linking object as if no debug info

1>LINK : error LNK2001: unresolved external symbol mainCRTStartup

1>C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\x64\Debug\MatrixMul_test.exe : fatal error LNK1120: 1 unresolved externals

1>

1>Build FAILED.

1>

1>Time Elapsed 00:00:08.33

========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

A clever person might now suggest that I give up on this project and try adding the compiler option to the sample file that is included with the NSIGHT file. I tried. I copied the Matrix Multiply example folder to a new directory, adjusted the location of check.h, and hit “Build” again. I even tried changing the architecture from sm_10 to sm_20. No luck and a similar error message. (For some reason (file permissions?) the -ptxas-options=-dlcm=cg didn’t appear on the nvcc.exe line until I had copied the folder to a new location).

So, now that I have shared what didn’t work for me, can someone share with me what I was actually supposed to do? I appreciate your guidance, oh Gurus of the Aether!

Thanks,

(I had originally posted this in the NSIGHT forum, but I did not receive a response. I hope that this forum will be a better fit for this question. Thanks for looking!)

The Fermi tuning guide suggests that for some problems it can be useful to turn off the L1 cache, so I am interested in trying it out just to see how it affects my code. The CUDA Programming Guide says that this is to be done at compile time using the options -Xptxas -dlcm=cg with nvcc. I am attempting to compile a program to run on a Tesla C2050 using Visual Studio 2010, NVIDIA NSIGHT 1.5, and Windows SDK 7.0. I’m afraid that I am missing a piece to this puzzle, and I am hoping a kind soul will point out my mistake.

The code initially compiles and runs. When I looked in the Project Properties at the CUDA C/C++ build rule (for CUDA 3.2, which I believe came with NSIGHT), I did not see an option for setting the cache. So, I added --ptxas-options=-dlcm=cg to the “Additional Options” under the “Command Line” menu. The program now appears to compile, but it fails at the link stage.

I used the matrix multiply sample from the SDK as the example code. Disclaimer: I created a new project from the existing code, so I am not sure that all of the project settings are the same as the default. For this case, the output is:

1>------ Build started: Project: MatrixMul_test, Configuration: Debug x64 ------

1>Build started 11/3/2010 1:25:31 PM.

1>InitializeBuildStatus:

1>  Creating "x64\Debug\MatrixMul_test.unsuccessfulbuild" because "AlwaysCreate" was specified.

1>AddCudaCompileDeps:

1>Skipping target "AddCudaCompileDeps" because all output files are up-to-date with respect to the input files.

1>CudaBuild:

1>  Compiling CUDA source file matrixMul.cu...

1>  

1>  C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2008 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\x86_amd64" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\shared\inc" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\C\common\inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include"  -G0  --keep-dir "x64\Debug\" -maxrregcount=32  --machine 64 --compile --ptxas-options=-dlcm=cg  -D_NEXUS_DEBUG -g    -Xcompiler "/EHsc /nologo /Od /Zi  /MTd " -o "x64\Debug\matrixMul.obj" "C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\matrixMul.cu" 

1>  matrixMul.cu

1>  tmpxft_00000a3c_00000000-0_matrixMul.cudafe1.gpu

1>  tmpxft_00000a3c_00000000-5_matrixMul.cudafe2.gpu

1>  matrixMul.cu

1>  tmpxft_00000a3c_00000000-0_matrixMul.cudafe1.cpp

1>  tmpxft_00000a3c_00000000-11_matrixMul.ii

1>  Note: including windows.h

1>  Note: including math.h

1>  Note: including assert.h

1>  Note: including rendercheckGL.h

1>  Note: including <vector>

1>  Note: including <map>

1>  Note: including <string>

1>  Note: including GL/glew.h

1>  Note: including GL/glut.h

1>  Note: including lib: glut32.lib

1>  

1>  Deleting file "tmpxft_00000a3c_00000000-6_matrixMul.cpp3.o".

1>ClCompile:

1>  matrixMul_gold.cpp

1>ManifestResourceCompile:

1>  Microsoft (R) Windows (R) Resource Compiler Version 6.1.7600.16385

1>  

1>  Copyright (C) Microsoft Corporation.  All rights reserved.

1>  

1>  

1>LINK : warning LNK4044: unrecognized option '/-ptxas-options=-dlcm=cg'; ignored

1>matrixMul.obj : error LNK2019: unresolved external symbol computeGold referenced in function "void __cdecl runTest(int,char * *)" (?runTest@@YAXHPEAPEAD@Z)

1>C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\x64\Debug\MatrixMul_test.exe : fatal error LNK1120: 1 unresolved externals

1>

1>Build FAILED.

1>

1>Time Elapsed 00:00:07.19

========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

So, the --ptxas option is on the line for the nvcc compiler, and it seems to be compiling. However it has trouble finding a subroutine in the C++ file (computeGold).

Although this may be a red herring, I then decided to see what would happen if I changed the extension and item type of the the matrixMul_gold.cpp file to a CUDA C file. In this case, the linker appears to find the unresolved external symbol that was in the cpp file, but is now having trouble with the linked libraries (I think):

1>------ Build started: Project: MatrixMul_test, Configuration: Debug x64 ------

1>Build started 11/3/2010 1:43:15 PM.

1>InitializeBuildStatus:

1>  Touching "x64\Debug\MatrixMul_test.unsuccessfulbuild".

1>AddCudaCompileDeps:

1>Skipping target "AddCudaCompileDeps" because all output files are up-to-date with respect to the input files.

1>CudaBuild:

1>  Compiling CUDA source file matrixMul_gold.cu...

1>  Compiling CUDA source file matrixMul.cu...

1>  

1>  C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2008 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\x86_amd64" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\shared\inc" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\C\common\inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include"  -G0  --keep-dir "x64\Debug\" -maxrregcount=32  --machine 64 --compile --ptxas-options=-dlcm=cg  -D_NEXUS_DEBUG -g    -Xcompiler "/EHsc /nologo /Od /Zi  /MTd " -o "x64\Debug\matrixMul_gold.obj" "C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\matrixMul_gold.cu" 

1>  matrixMul_gold.cu

1>  tmpxft_00000420_00000000-0_matrixMul_gold.cudafe1.gpu

1>  tmpxft_00000420_00000000-5_matrixMul_gold.cudafe2.gpu

1>  matrixMul_gold.cu

1>  tmpxft_00000420_00000000-0_matrixMul_gold.cudafe1.cpp

1>  tmpxft_00000420_00000000-11_matrixMul_gold.ii

1>  

1>  C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2008 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\x86_amd64" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\shared\inc" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\C\common\inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include"  -G0  --keep-dir "x64\Debug\" -maxrregcount=32  --machine 64 --compile --ptxas-options=-dlcm=cg  -D_NEXUS_DEBUG -g    -Xcompiler "/EHsc /nologo /Od /Zi  /MTd " -o "x64\Debug\matrixMul.obj" "C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\matrixMul.cu" 

1>  matrixMul.cu

1>  tmpxft_00000b58_00000000-0_matrixMul.cudafe1.gpu

1>  tmpxft_00000b58_00000000-5_matrixMul.cudafe2.gpu

1>  matrixMul.cu

1>  tmpxft_00000b58_00000000-0_matrixMul.cudafe1.cpp

1>  tmpxft_00000b58_00000000-11_matrixMul.ii

1>  Note: including windows.h

1>  Note: including math.h

1>  Note: including assert.h

1>  Note: including rendercheckGL.h

1>  Note: including <vector>

1>  Note: including <map>

1>  Note: including <string>

1>  Note: including GL/glew.h

1>  Note: including GL/glut.h

1>  Note: including lib: glut32.lib

1>  

1>  Deleting file "tmpxft_00000420_00000000-6_matrixMul_gold.cpp3.o".

1>  Deleting file "tmpxft_00000b58_00000000-6_matrixMul.cpp3.o".

1>ManifestResourceCompile:

1>  All outputs are up-to-date.

1>LINK : warning LNK4044: unrecognized option '/-ptxas-options=-dlcm=cg'; ignored

1>shrUtils64D.lib(cmd_arg_reader.obj) : warning LNK4204: 'C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\x64\Debug\vc90.pdb' is missing debugging information for referencing module; linking object as if no debug info

1>shrUtils64D.lib(shrUtils.obj) : warning LNK4204: 'C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\x64\Debug\vc90.pdb' is missing debugging information for referencing module; linking object as if no debug info

1>LINK : error LNK2001: unresolved external symbol mainCRTStartup

1>C:\Users\kotas\Documents\Visual Studio 2010\Projects\cuda_temp_test\x64\Debug\MatrixMul_test.exe : fatal error LNK1120: 1 unresolved externals

1>

1>Build FAILED.

1>

1>Time Elapsed 00:00:08.33

========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

A clever person might now suggest that I give up on this project and try adding the compiler option to the sample file that is included with the NSIGHT file. I tried. I copied the Matrix Multiply example folder to a new directory, adjusted the location of check.h, and hit “Build” again. I even tried changing the architecture from sm_10 to sm_20. No luck and a similar error message. (For some reason (file permissions?) the -ptxas-options=-dlcm=cg didn’t appear on the nvcc.exe line until I had copied the folder to a new location).

So, now that I have shared what didn’t work for me, can someone share with me what I was actually supposed to do? I appreciate your guidance, oh Gurus of the Aether!

Thanks,