VS2022+cuda

I recently downloaded VS2022 and integrated the latest version of CUDA. However, I found that when I moved the CUDA program from VS2017 to VS2022, the runtime of the GPU portion of the program slowed down by more than two times. Both versions of Visual Studio are using the same version of CUDA. What could be the reasons for this issue?

one possibility would be different project settings. For example if the VS2017 project is a release project and the VS2022 project is a debug project, that could explain the difference. Of course, I don’t know for sure that this is the issue. It might not be related at all. It’s just speculation.

1 Like

Both are in release mode. The CPU computation part does not show any significant time difference, but there is a substantial time discrepancy in the CUDA program portion. Are there any compiler options that could cause such a significant impact?

Can you compare the shown nvcc parameters for both versions of Visual Studio?

1 Like

I found two sets of information under Properties → CUDA C/C++ → Command Line, and I’m not sure if they are what you mentioned. The first set is for VS2022, and the second set is for VS2017.

VS2022
Build started at 11:55…
1>------ Build started: Project: TEST2022, Configuration: Release x64 ------
1>Compiling CUDA source file main.cu…
1>
1>C:\Users\Administrator\source\repos\TEST2022\TEST2022>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc.exe” -gencode=arch=compute_70,code="sm_70,compute_70" --use-local-env -ccbin “C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64” -x cu -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" --keep-dir x64\Release -maxrregcount=0 --machine 64 --compile -cudart static -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /FS /MD " -Xcompiler “/Fdx64\Release\vc143.pdb” -o C:\Users\Administrator\source\repos\TEST2022\TEST2022\x64\Release\main.cu.obj “C:\Users\Administrator\source\repos\TEST2022\TEST2022\main.cu”
1>main.cu
1>tmpxft_00007b78_00000000-7_main.cudafe1.cpp
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
========== Build completed at 11:55 and took 06.989 seconds ==========

VS2017
1>------ Build started: Project: CUDA 12.6 Runtime1, Configuration: Release x64 ------
1>Compiling CUDA source file File.cu…
1>
1>C:\Users\Administrator\source\repos\CUDA 12.6 Runtime1\CUDA 12.6 Runtime1>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc.exe” -gencode=arch=compute_70,code="sm_70,compute_70" --use-local-env -ccbin “C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64” -x cu -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" --keep-dir x64\Release -maxrregcount=0 --machine 64 --compile -cudart static -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /FS /MD " -Xcompiler “/Fdx64\Release\vc141.pdb” -o “C:\Users\Administrator\source\repos\CUDA 12.6 Runtime1\CUDA 12.6 Runtime1\x64\Release\File.cu.obj” “C:\Users\Administrator\source\repos\CUDA 12.6 Runtime1\CUDA 12.6 Runtime1\File.cu”
1>CUDACOMPILE : nvcc warning : nvcc support for Microsoft Visual Studio 2017 and earlier has been deprecated and is no longer being maintained
1>File.cu
1>tmpxft_000099e4_00000000-7_File.cudafe1.cpp
1>Done building project “CUDA 12.6 Runtime1.vcxproj”.
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

Run a compilation in each case (i.e. build or rebuild the project). Copy the complete build console output from each case, and compare them. The compilation commands will include lines that are launching the nvcc compiler. Those are the ones to focus on.

1 Like

I modified the compilation information above, but I still can’t identify the issue.

There don’t appear to be important differences in that compilation command. what GPU are you actually running the code on?

1 Like

GV100 32GB

The file name changed from File.cu to main.cu. Perhaps some more small things changed, affecting what you measure as the GPU portion of the program.

You could try to compare more identical programs,
you could measure with Compute Nsight or Compute Systems.

The same nvcc compiler is used, you could make sure by comparing the PTX code (see also -keep and -keep-dir parameters of nvcc).

1 Like

I also downloaded VS2019, but I’m still encountering the same issues as in VS2022. I’m not very familiar with Nsight Compute, but after a simple analysis, I found that the kernels in the VS2022 program use significantly more registers compared to those in VS2017. One kernel even uses nearly 20 more registers. I don’t understand why this happens even though both versions are using CUDA 12.6.
The same program runs slightly faster in VS2017 with CUDA 11.2 compared to CUDA 12.6, but the difference in speed is quite small. Could there be an issue with my program?



It seems that VS2022 is using FP64? I didn’t mention “double” at all.

The different number of registers is already a useful hint.

In your command line it says (= unlimited)

You can set a different limit. Another way to do it is using the __launch_bounds__(int threads) at your kernels, e.g. after __global__ to specify the number of threads, which should fit into a block.

It would be interesting, if
a) you can change the number of registers with one of the two mentioned methods
b) the speed changes in a way that you get the same speed with VS2017 and VS2022 as soon as they use the same number of registers

What is to find out is, where the different (initial?) register count comes from.

You updated your post with screenshots and the mention of FP64. Perhaps some constants are double constants (i.e. with decimal point, but no f postfix at the end).

1 Like

I found that it might be an issue with FP64. I attached the image in my previous response, and I don’t understand why this problem occurs.

In Compute Nsight (especially if the program was compiled with debug info (lineinfo or complete debug info), you can switch to a source code view and see, which of your C++ instructions were compiled into which PTX or SASS instructions. You can activate a side-by-side view.

Typical FP64 SASS instructions are DMUL, DFMA, DADD.

As you said, there are a lot of instructions like DADD and DMUL in source code. But I still don’t know how to solve such an unexpected problem. All the numbers in my code have f on them and are all defined in float.
Moreover, this situation only occurred with one kernel; the other kernels did not have any FP64 usage.

Have you looked at the side-by-side view, which C++ instructions lead to DADD and DMUL?

1 Like

I finally found the problem: my program used pow, which converted all my single-precision floating-point numbers to double-precision for calculations. After I switched to powf, I noticed the speed was still not great. Finally, I changed it to direct multiplication, and the speed only differed by 3% from VS2017. It seems that VS2017 optimized my code in many ways that I couldn’t see.

Normally it should not by itself, as it only calls the nvcc for compiling the kernels.

1 Like

Although I haven’t fully understood the specific reasons why the two versions produce different nvcc compilations, I have at least solved the existing issues and realized the importance of learning Nsight Compute.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.