Hello, guys!
I have a problem with profiling by nvprof. I’ve installed Cuda Developer Toolkit 11.5.1. I have really done all actions for getting success build to sample of “vectorAdd” from preinstalled samples:
Сборка начата…
1>------ Сборка начата: проект: vectorAdd, Конфигурация: Debug x64 ------
1>Сборка начата 07.12.2021 15:05:07.
1>Целевой объект InitializeBuildStatus:
1> Обращение к "x64/Debug/vectorAdd.tlog\unsuccessfulbuild".
1>Целевой объект AddCudaCompileDeps:
1> Целевой объект "AddCudaCompileDeps" пропускается, так как все выходные файлы актуальны по отношению к входным.
1>Целевой объект WriteCudaCompileTlogs:
1> Целевой объект "WriteCudaCompileTlogs" пропускается, так как все выходные файлы актуальны по отношению к входным.
1>Целевой объект CudaBuild:
1> Целевой объект CudaBuildCore:
1> Compiling CUDA source file vectorAdd.cu...
1>
1> D:\Tools\CUDA\samples\v11.5\0_Simple\vectorAdd>"D:\Tools\CUDA\v11.5\bin\nvcc.exe" -gencode=arch=compute_35,code=\"sm_35,compute_35\" -gencode=arch=compute_37,code=\"sm_37,compute_37\" -gencode=arch=compute_50,code=\"sm_50,compute_50\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_60,code=\"sm_60,compute_60\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" -gencode=arch=compute_80,code=\"sm_80,compute_80\" -gencode=arch=compute_86,code=\"sm_86,compute_86\" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64" -x cu -I./ -I../../common/inc -I./ -ID:\Tools\CUDA\v11.5\/include -I../../common/inc -ID:\Tools\CUDA\v11.5\include -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static --threads 0 -g -DWIN32 -DWIN32 -D_MBCS -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Fdx64/Debug/vc142.pdb /FS /Zi /RTC1 /MTd " -o x64/Debug/vectorAdd.cu.obj "D:\Tools\CUDA\samples\v11.5\0_Simple\vectorAdd\vectorAdd.cu"
1> CUDACOMPILE : nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
1> vectorAdd.cu
1> vectorAdd.cu
1> vectorAdd.cu
1> vectorAdd.cu
1> vectorAdd.cu
1> vectorAdd.cu
1> vectorAdd.cu
1> vectorAdd.cu
1> vectorAdd.cu
1> vectorAdd.cu
1> vectorAdd.cu
1> vectorAdd.cu
1> tmpxft_0000360c_00000000-7_vectorAdd.compute_86.cudafe1.cpp
1> Сборка целевого объекта "CudaBuildCore" в проекте "vectorAdd_vs2019.vcxproj" завершена.
1>
1> Сборка проекта "vectorAdd_vs2019.vcxproj" завершена.
1>Целевой объект Link:
1> vectorAdd_vs2019.vcxproj -> D:\Tools\CUDA\samples\v11.5\bin\win64\Debug\vectorAdd.exe
1>Целевой объект FinalizeBuildStatus:
1> Файл "x64/Debug/vectorAdd.tlog\unsuccessfulbuild" удаляется.
1> Обращение к "x64/Debug/vectorAdd.tlog\vectorAdd.lastbuildstate".
1>
1>Сборка успешно завершена.
1>
1>CUDACOMPILE : nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
1> Предупреждений: 1
1> Ошибок: 0
1>
1>Прошло времени 00:00:09.70
========== Сборка: успешно: 1, с ошибками: 0, без изменений: 0, пропущено: 0 ==========
And this application is run successfully from console:
D:\Tools\CUDA\samples\v11.5\bin\win64\Debug>vectorAdd.exe
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
But when i start profiling with command nvprof I get that:
D:\Tools\CUDA\samples\v11.5\bin\win64\Debug>nvprof --metrics all vectorAdd.exe
[Vector addition of 50000 elements]
==18508== NVPROF is profiling process 18508, command: vectorAdd.exe
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
==18508== Some kernel(s) will be replayed on device 0 in order to collect all events/metrics.
==18508== Replaying kernel "vectorAdd(float const *, float const *, float*, int)" (54 of 54)...
1 internal events
==18508== Profiling application: vectorAdd.exe
==18508== Profiling result:
No events/metrics were profiled.
======== Error: Application returned non-zero code -1073741676
I don’t understand why application returns non-zero code. How I can resolve this problem? Please, help.
P.S.:
My GPU: NVidia GeForce GTX 1050
All tests (bandwithTest and deviceQuery from samples) are passed.
And:
D:\Tools\CUDA\samples\v11.5\bin\win64\Debug>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:52:33_Pacific_Standard_Time_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
D:\Tools\CUDA\samples\v11.5\bin\win64\Debug>nvprof --version
nvprof: NVIDIA (R) Cuda command line profiler
Copyright (c) 2012 - 2021 NVIDIA Corporation
Release version 11.5.114 (21)
D:\Tools\CUDA\samples\v11.5\bin\win64\Debug>nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 1050 (UUID: GPU-21a54983-7230-f355-5c03-1f4785f8b6e8)
D:\Tools\CUDA\samples\v11.5\bin\win64\Debug>nvidia-smi -q
==============NVSMI LOG==============
Timestamp : Tue Dec 7 15:28:07 2021
Driver Version : 497.09
CUDA Version : 11.5
Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : NVIDIA GeForce GTX 1050
Product Brand : GeForce
Product Architecture : Pascal
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : N/A
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : WDDM
Pending : WDDM
Serial Number : N/A
GPU UUID : GPU-21a54983-7230-f355-5c03-1f4785f8b6e8
Minor Number : N/A
VBIOS Version : 86.07.93.00.1f
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : N/A
Module ID : 0
Inforom Version
Image Version : N/A
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x1C9110DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x86D4103C
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 8x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 7000 KB/s
Rx Throughput : 121000 KB/s
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 3072 MiB
Used : 80 MiB
Free : 2992 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 48 C
GPU Shutdown Temp : 102 C
GPU Slowdown Temp : 97 C
GPU Max Operating Temp : 94 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : N/A
Power Draw : N/A
Power Limit : N/A
Default Power Limit : N/A
Enforced Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 1 MHz
SM : 1 MHz
Memory : 405 MHz
Video : 544 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1911 MHz
SM : 1911 MHz
Memory : 3504 MHz
Video : 1708 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 13540
Type : C+G
Name : D:\Games\Epic Games\Launcher\Engine\Binaries\Win64\EpicWebHelper.exe
Used GPU Memory : Not available in WDDM driver model
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 14868
Type : C+G
Name : D:\Games\Epic Games\Launcher\Portal\Binaries\Win64\EpicGamesLauncher.exe
Used GPU Memory : Not available in WDDM driver model