From my readings of the documentations (https://developer.nvidia.com/nsight-compute) NSIGHT Compute and nvprof should be able to produce detailed profiling metrics for any TU1XX chip.
However, it does not work with my RTX 2060.
nvprof can run with “summary” options (just regular tracing).
nvprof.exe 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\extras\demo_suite\vectorAdd.exe' # or nvprof.exe -o output.nvvp -f 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\extras\demo_suite\vectorAdd.exe'
But advanced profiling does not work:
nvprof.exe -o output.nvvp -f --analysis-metrics 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\extras\demo_suite\vectorAdd.exe'
Producing the following error (Warning) in the output, and does not generate any detailed information about the executed kernels.
Toolkit\CUDA\v11.0\extras\demo_suite\vectorAdd.exe' ======== Warning: Skipping profiling on device 0 since profiling is not supported on devices with compute capability 7.5 and higher. Use NVIDIA Nsight Compute for GPU profiling and NVIDIA Nsight Systems for GPU tracing and CPU sampling. Refer https://developer.nvidia.com/tools-overview for more details. ======== Warning: The option --aggregate-mode on has no effect. The --aggregate-mode <on|off> option applies to --events and --metrics options that follow it. ======== Warning: The option --aggregate-mode off has no effect. The --aggregate-mode <on|off> option applies to --events and --metrics options that follow it. [Vector addition of 50000 elements] ==19508== NVPROF is profiling process 19508, command: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\extras\demo_suite\vectorAdd.exe Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done ==19508== Generated result file: C:\Users\Agostini\output.nvvp
I have also tried to dual boot with Ubuntu 20.04 and I receive the same error. Furthermore in windows, “MS Visual Studio 2019 > NSIGHT > Start performance analysis…” detects the device but upon profiling execution the following error occurs.
Attempted to perform CUDA trace on an unsupported CUDA device. Serialized kernel trace mode has been used.
I have also tried to use NSIGHT Compute in both windows and ubuntu without success (gives an error but it is not descriptive).
Is the RTX 2060 KO (TU104) supported by CUDA 11.0 tools?
What consumer cards from the Turing generation support detailed profiling?
Thank you in advance