HI,
I’m a new user the Nsight Systems. I’ve created a docker container to run the command line, nsys, on CentOS 7. Our system has two Tesla V100 GPUs.
Container was run in the following manner:
docker run --rm --gpus=all --cap-add=SYS_ADMIN --net=host -v $(pwd):/data -w /data -it centos-gpu-tools:latest bash
The nsys status command results:
[root@syseng-2-dell-hpc gpu-burn]# nsys status -e
Timestamp counter supported: Yes
CPU Profiling Environment Check
Root privilege: enabled
Linux Kernel Paranoid Level = 2
Linux Distribution = CentOS
Linux Kernel Version = 3.10.0-1160.80.1.el7.x86_64: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Not Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): OK
See the product documentation at https://docs.nvidia.com/nsight-systems for more information,
including information on how to set the Linux Kernel Paranoid Level.
I run nsys with a test application, gpu_burn:
[root@syseng-2-dell-hpc gpu-burn]# nsys profile -t cuda,nvtx,osrt,cublas -f true -o /data/results2 --gpu-metrics-device=all --cuda-memory-usage=true --export=sqlite ./gpu_burn 30
Burning for 30 seconds.
GPU 0: Tesla V100-PCIE-32GB (UUID: GPU-94dfee0f-03e6-52e2-bdb5-705f1c0f8b9f)
GPU 1: Tesla V100-PCIE-32GB (UUID: GPU-ccd7bd6d-e9bb-b57e-9ca0-7690deef2b6d)
Initialized device 0 with 32510 MB of memory (32052 MB available, using 28847 MB of it), using FLOATS
Results are 16777216 bytes each, thus performing 1800 iterations
Initialized device 1 with 32510 MB of memory (32052 MB available, using 28847 MB of it), using FLOATS
Results are 16777216 bytes each, thus performing 1800 iterations
16.7% proc'd: 1800 (6691 Gflop/s) - 0 (0 Gflop/s) errors: 0 - 0 temps: 28 C - 26 C
Summary at: Mon Dec 5 16:14:24 UTC 2022
33.3% proc'd: 5400 (12878 Gflop/s) - 3600 (12884 Gflop/s) errors: 0 - 0 temps: 39 C - 39 C
Summary at: Mon Dec 5 16:14:29 UTC 2022
50.0% proc'd: 9000 (12876 Gflop/s) - 7200 (12920 Gflop/s) errors: 0 - 0 temps: 42 C - 42 C
Summary at: Mon Dec 5 16:14:34 UTC 2022
66.7% proc'd: 12600 (12849 Gflop/s) - 10800 (12916 Gflop/s) errors: 0 - 0 temps: 44 C - 43 C
Summary at: Mon Dec 5 16:14:39 UTC 2022
80.0% proc'd: 14400 (12836 Gflop/s) - 16200 (12866 Gflop/s) errors: 0 - 0 temps: 46 C - 46 C
Summary at: Mon Dec 5 16:14:43 UTC 2022
96.7% proc'd: 19800 (12879 Gflop/s) - 18000 (12852 Gflop/s) errors: 0 - 0 temps: 48 C - 49 C
Summary at: Mon Dec 5 16:14:48 UTC 2022
100.0% proc'd: 19800 (12879 Gflop/s) - 19800 (12731 Gflop/s) errors: 0 - 0 temps: 48 C - 49 C
Killing processes.. Freed memory for dev 0
Uninitted cublas
Freed memory for dev 1
Uninitted cublas
done
Tested 2 GPUs:
GPU 0: OK
GPU 1: OK
Generating '/tmp/nsys-report-cdc5.qdstrm'
[1/2] [========================100%] results2.nsys-rep
[2/2] [========================100%] results2.sqlite
Generated:
/data/results2.nsys-rep
/data/results2.sqlite
The following are the tables in the sqlite3 database:
-bash-4.2$ sqlite3 results2.sqlite
SQLite version 3.7.17 2013-05-20 00:56:22
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .tables
ANALYSIS_DETAILS ENUM_OPENMP_MUTEX
COMPOSITE_EVENTS ENUM_OPENMP_SYNC_REGION
ENUM_CUDA_DEV_MEM_EVENT_OPER ENUM_OPENMP_TASK_FLAG
ENUM_CUDA_FUNC_CACHE_CONFIG ENUM_OPENMP_TASK_STATUS
ENUM_CUDA_KRENEL_LAUNCH_TYPE ENUM_OPENMP_THREAD
ENUM_CUDA_MEMCPY_OPER ENUM_OPENMP_WORK
ENUM_CUDA_MEMPOOL_OPER ENUM_SAMPLING_THREAD_STATE
ENUM_CUDA_MEMPOOL_TYPE ENUM_SLI_TRANSER
ENUM_CUDA_MEM_KIND ENUM_STACK_UNWIND_METHOD
ENUM_CUDA_SHARED_MEM_LIMIT_CONFIG ENUM_VULKAN_PIPELINE_CREATION_FLAGS
ENUM_CUDA_UNIF_MEM_ACCESS_TYPE ENUM_WDDM_ENGINE_TYPE
ENUM_CUDA_UNIF_MEM_MIGRATION ENUM_WDDM_INTERRUPT_TYPE
ENUM_CUPTI_STREAM_TYPE ENUM_WDDM_PACKET_TYPE
ENUM_CUPTI_SYNC_TYPE ENUM_WDDM_PAGING_QUEUE_TYPE
ENUM_D3D12_CMD_LIST_TYPE ENUM_WDDM_VIDMM_OP_TYPE
ENUM_D3D12_HEAP_FLAGS EXPORT_META_DATA
ENUM_D3D12_HEAP_TYPE NVTX_EVENTS
ENUM_D3D12_PAGE_PROPERTY OSRT_API
ENUM_DXGI_FORMAT OSRT_CALLCHAINS
ENUM_GPU_CTX_SWITCH PROCESSES
ENUM_NSYS_EVENT_CLASS PROFILER_OVERHEAD
ENUM_NSYS_EVENT_TYPE ProcessStreams
ENUM_NVDRIVER_EVENT_ID SAMPLING_CALLCHAINS
ENUM_OPENACC_DEVICE SCHED_EVENTS
ENUM_OPENACC_EVENT_KIND StringIds
ENUM_OPENGL_DEBUG_SEVERITY TARGET_INFO_GPU
ENUM_OPENGL_DEBUG_SOURCE TARGET_INFO_SESSION_START_TIME
ENUM_OPENGL_DEBUG_TYPE TARGET_INFO_SYSTEM_ENV
ENUM_OPENMP_DISPATCH ThreadNames
ENUM_OPENMP_EVENT_KIND UnwindMethodType
sqlite>
I was expecting the following tables to be available: CUPTI_ACTIVITY_KIND_MEMCPY, CUDA_GPU_MEMORY_USAGE_EVENTS and CUPTI_ACTIVITY_KIND_KERNEL.
So the question is, how do I get these tables? I’m assuming I’ve missed some, but not sure what.
Any help would be greatly appreciated.
Tony