Hello,
I cant use Nsight profiler in windows 10.
I am trying to profile a binary executable but when I do so I got these errors:
C:\Windows\system32>ncu -o profilerOuput --target-processes all D:\CUDA\a.exe
==PROF== Connected to process 2344 (D:\CUDA\a.exe)
Execution Configuration:
N = 4, Number of Blocks = 1, Number of Threads Per Block = 1
Matrix
0.0225126 5.64585 1.94304 8.09741
5.86009 4.80873 3.51291 8.96962
8.2384 7.47605 1.75108 8.59943
7.11501 5.14535 3.04995 0.159846
Vector :
0.924029 3.65452 1.48313 1.66899
Kernel Result :
0 0 0 0
cuBLAS Result :
5.0888e+276 5.83022e+252 1.65376e+243 1.29062e-306
“ERRORS I AM GETTING”
==PROF== Disconnected from process 2344
==WARNING== No kernels were profiled.
With NVPROF apparently, the problem does not happen:
C:\Windows\system32>nvprof D:\CUDA\a.exe
==19608== NVPROF is profiling process 19608, command: D:\CUDA\a.exe
Execution Configuration:
N = 4, Number of Blocks = 1, Number of Threads Per Block = 1
Matrix
0.0225126 5.64585 1.94304 8.09741
5.86009 4.80873 3.51291 8.96962
8.2384 7.47605 1.75108 8.59943
7.11501 5.14535 3.04995 0.159846
Vector :
0.924029 3.65452 1.48313 1.66899
Kernel Result :
0 0 0 0
cuBLAS Result :
1.41087e-311 1.41087e-311 1.41087e-311 1.41087e-311
==19608== Profiling application: D:\CUDA\a.exe
==19608== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 51.43% 1.1520us 2 576ns 192ns 960ns [CUDA memcpy HtoD]
48.57% 1.0880us 1 1.0880us 1.0880us 1.0880us [CUDA memcpy DtoH]
API calls: 85.79% 86.283ms 4 21.571ms 3.0000us 86.271ms cudaMalloc
13.54% 13.619ms 1 13.619ms 13.619ms 13.619ms cuDevicePrimaryCtxRelease
0.32% 320.20us 3 106.73us 36.700us 202.90us cudaMemcpy
0.19% 195.40us 4 48.850us 2.7000us 164.60us cudaFree
0.11% 113.60us 1 113.60us 113.60us 113.60us cuLibraryLoadData
0.02% 23.600us 114 207ns 100ns 1.6000us cuDeviceGetAttribute
0.01% 5.7000us 3 1.9000us 300ns 5.0000us cuDeviceGetCount
0.01% 5.3000us 1 5.3000us 5.3000us 5.3000us cuLibraryUnload
0.00% 2.7000us 1 2.7000us 2.7000us 2.7000us cuModuleGetLoadingMode
0.00% 1.9000us 2 950ns 200ns 1.7000us cuDeviceGet
0.00% 1.5000us 1 1.5000us 1.5000us 1.5000us cuDeviceTotalMem
0.00% 800ns 1 800ns 800ns 800ns cuDeviceGetName
0.00% 600ns 1 600ns 600ns 600ns cuDeviceGetLuid
0.00% 600ns 1 600ns 600ns 600ns cudaLaunchKernel
0.00% 300ns 1 300ns 300ns 300ns cuDeviceGetUuid
Does anyone know how to make it run with Nsight?
I need it running asap.