VS2022+cuda

2249884324 · October 29, 2024, 10:27am

I recently downloaded VS2022 and integrated the latest version of CUDA. However, I found that when I moved the CUDA program from VS2017 to VS2022, the runtime of the GPU portion of the program slowed down by more than two times. Both versions of Visual Studio are using the same version of CUDA. What could be the reasons for this issue?

Robert_Crovella · October 29, 2024, 1:15pm

one possibility would be different project settings. For example if the VS2017 project is a release project and the VS2022 project is a debug project, that could explain the difference. Of course, I don’t know for sure that this is the issue. It might not be related at all. It’s just speculation.

2249884324 · October 29, 2024, 3:16pm

Both are in release mode. The CPU computation part does not show any significant time difference, but there is a substantial time discrepancy in the CUDA program portion. Are there any compiler options that could cause such a significant impact?

Curefab · October 29, 2024, 4:20pm

Can you compare the shown nvcc parameters for both versions of Visual Studio?

2249884324 · October 30, 2024, 2:02am

I found two sets of information under Properties → CUDA C/C++ → Command Line, and I’m not sure if they are what you mentioned. The first set is for VS2022, and the second set is for VS2017.

VS2022
Build started at 11:55…
1>------ Build started: Project: TEST2022, Configuration: Release x64 ------
1>Compiling CUDA source file main.cu…
1>
1>C:\Users\Administrator\source\repos\TEST2022\TEST2022>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc.exe” -gencode=arch=compute_70,code="sm_70,compute_70" --use-local-env -ccbin “C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64” -x cu -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" --keep-dir x64\Release -maxrregcount=0 --machine 64 --compile -cudart static -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /FS /MD " -Xcompiler “/Fdx64\Release\vc143.pdb” -o C:\Users\Administrator\source\repos\TEST2022\TEST2022\x64\Release\main.cu.obj “C:\Users\Administrator\source\repos\TEST2022\TEST2022\main.cu”
1>main.cu
1>tmpxft_00007b78_00000000-7_main.cudafe1.cpp
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
========== Build completed at 11:55 and took 06.989 seconds ==========

VS2017
1>------ Build started: Project: CUDA 12.6 Runtime1, Configuration: Release x64 ------
1>Compiling CUDA source file File.cu…
1>
1>C:\Users\Administrator\source\repos\CUDA 12.6 Runtime1\CUDA 12.6 Runtime1>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc.exe” -gencode=arch=compute_70,code="sm_70,compute_70" --use-local-env -ccbin “C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64” -x cu -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" --keep-dir x64\Release -maxrregcount=0 --machine 64 --compile -cudart static -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /FS /MD " -Xcompiler “/Fdx64\Release\vc141.pdb” -o “C:\Users\Administrator\source\repos\CUDA 12.6 Runtime1\CUDA 12.6 Runtime1\x64\Release\File.cu.obj” “C:\Users\Administrator\source\repos\CUDA 12.6 Runtime1\CUDA 12.6 Runtime1\File.cu”
1>CUDACOMPILE : nvcc warning : nvcc support for Microsoft Visual Studio 2017 and earlier has been deprecated and is no longer being maintained
1>File.cu
1>tmpxft_000099e4_00000000-7_File.cudafe1.cpp
1>Done building project “CUDA 12.6 Runtime1.vcxproj”.
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

Robert_Crovella · October 30, 2024, 2:11am

Run a compilation in each case (i.e. build or rebuild the project). Copy the complete build console output from each case, and compare them. The compilation commands will include lines that are launching the nvcc compiler. Those are the ones to focus on.

2249884324 · October 30, 2024, 4:48am

I modified the compilation information above, but I still can’t identify the issue.

Robert_Crovella · October 30, 2024, 9:29am

There don’t appear to be important differences in that compilation command. what GPU are you actually running the code on?

2249884324 · October 30, 2024, 9:44am

GV100 32GB

Curefab · October 30, 2024, 10:39am

The file name changed from File.cu to main.cu. Perhaps some more small things changed, affecting what you measure as the GPU portion of the program.

You could try to compare more identical programs,
you could measure with Compute Nsight or Compute Systems.

The same nvcc compiler is used, you could make sure by comparing the PTX code (see also -keep and -keep-dir parameters of nvcc).

2249884324 · October 30, 2024, 1:49pm

I also downloaded VS2019, but I’m still encountering the same issues as in VS2022. I’m not very familiar with Nsight Compute, but after a simple analysis, I found that the kernels in the VS2022 program use significantly more registers compared to those in VS2017. One kernel even uses nearly 20 more registers. I don’t understand why this happens even though both versions are using CUDA 12.6.
The same program runs slightly faster in VS2017 with CUDA 11.2 compared to CUDA 12.6, but the difference in speed is quite small. Could there be an issue with my program?

It seems that VS2022 is using FP64? I didn’t mention “double” at all.

Curefab · October 30, 2024, 2:50pm

The different number of registers is already a useful hint.

In your command line it says (= unlimited)

You can set a different limit. Another way to do it is using the __launch_bounds__(int threads) at your kernels, e.g. after __global__ to specify the number of threads, which should fit into a block.

It would be interesting, if
a) you can change the number of registers with one of the two mentioned methods
b) the speed changes in a way that you get the same speed with VS2017 and VS2022 as soon as they use the same number of registers

What is to find out is, where the different (initial?) register count comes from.

You updated your post with screenshots and the mention of FP64. Perhaps some constants are double constants (i.e. with decimal point, but no f postfix at the end).

2249884324 · October 30, 2024, 2:54pm

I found that it might be an issue with FP64. I attached the image in my previous response, and I don’t understand why this problem occurs.

Curefab · October 30, 2024, 2:56pm

In Compute Nsight (especially if the program was compiled with debug info (lineinfo or complete debug info), you can switch to a source code view and see, which of your C++ instructions were compiled into which PTX or SASS instructions. You can activate a side-by-side view.

Typical FP64 SASS instructions are DMUL, DFMA, DADD.

2249884324 · October 30, 2024, 3:57pm

As you said, there are a lot of instructions like DADD and DMUL in source code. But I still don’t know how to solve such an unexpected problem. All the numbers in my code have f on them and are all defined in float.
Moreover, this situation only occurred with one kernel; the other kernels did not have any FP64 usage.

Curefab · October 30, 2024, 5:44pm

Have you looked at the side-by-side view, which C++ instructions lead to DADD and DMUL?

2249884324 · October 31, 2024, 9:49am

I finally found the problem: my program used pow, which converted all my single-precision floating-point numbers to double-precision for calculations. After I switched to powf, I noticed the speed was still not great. Finally, I changed it to direct multiplication, and the speed only differed by 3% from VS2017. It seems that VS2017 optimized my code in many ways that I couldn’t see.

Curefab · October 31, 2024, 11:44am

Normally it should not by itself, as it only calls the nvcc for compiling the kernels.

2249884324 · October 31, 2024, 12:04pm

Although I haven’t fully understood the specific reasons why the two versions produce different nvcc compilations, I have at least solved the existing issues and realized the importance of learning Nsight Compute.

system · November 14, 2024, 12:05pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Difference in Performance CUDA Programming and Performance	13	9738	August 20, 2008
Cuda 9.2 does NOT work with Visual Studio 2017 15.7.1 CUDA Setup and Installation	39	27082	September 21, 2018
CUDA 4.0 with VS2010 Strange issue with CUDA on VS 2010 CUDA Programming and Performance	9	3960	August 4, 2011
Build Error MSB3721 When calling object method within kernel, using compiler directives CUDA Programming and Performance	9	5727	November 18, 2015
Visual Studio 2017 not detecting changes in CUDA .cu files CUDA Setup and Installation	37	11633	August 26, 2021
Nvcc error : 'cicc' died with status 0xC0000005 - Only in DEBUG mode CUDA NVCC Compiler	7	2513	April 30, 2024
Compiling Nvidia CUDA 10.1 in Visual Studio 2019 (Enterprise) project Fails Nsight Visual Studio Edition	21	11504	October 3, 2021
CUDA 3.2 and Visual Studio 2010 CUDA Programming and Performance	15	175168	November 24, 2010
Unable to compile CUDA file CUDA Setup and Installation	9	10261	May 19, 2017
Slow compile and cudaMalloc CUDA Programming and Performance	8	3703	February 2, 2011

VS2022+cuda

Related topics