How to profile NvEglRenderer under Orin?

Hi,
According to
https://docs.nvidia.com/nsight-systems/UserGuide/index.html#opengl-trace
it should be possible to record eglSwapBuffers and show correlations between thread state and graphics driver’s behavior.

I tried to run
DISPLAY=:0 nsys profile some_app
or
DISPLAY=:0 nsys profile --trace=opengl,opengl-annotations some_app
where some_app is any app that uses Egl, for example one of jetson_multimedia_api/samples or this simple app:
vline/main.cpp at master · wyckster/vline · GitHub,
but then “nsys stats report*.nsys-rep” get nothing other then “OS Runtime Summary”

How to get trace of Egl and OpenGL, measure their overhead and trace display vsync or similar events?

Thank you

Hi,

Please try to run nsight-system with root privileges.
Thanks.

No difference:

DISPLAY=:0 xhost + local:
rm report*
DISPLAY=:0 sudo -E nsys profile ./vline
nsys stats report*.nsys-rep
Only section is “OS Runtime Summary (osrt_sum)”
all other sections are empty. There is no EGL or GL section at all
or
rm report*
DISPLAY=:0 sudo -E nsys profile --trace=opengl,opengl-annotations ./vline
nsys stats report*.nsys-rep
Nothing

Hi,

Are you able to run it on the device directly?

For this DISPLAY=:0 sudo ... command, it updates the environment value of the $USER account but runs the profiler with root.

Thanks.

First I run
DISPLAY=:0 xhost + local:
this allows root to access X
Then I can either run
DISPLAY=:0 sudo -E nsys profile some_app
or
sudo su
DISPLAY=:0 nsys profile some_app
The result is the same (copies below). note that it prints many lines like “SKIPPED: report1.sqlite does not contain CUDA trace data.” or “SKIPPED: report1.sqlite does not contain OpenMP event data.”, but it never mentions EGL or GL. It appears that EGL and GL tracing are completely compiled out of nsys on Orin.
This is the full output from nsys stats report1.nsys-rep :

nsys stats report1.nsys-rep
Generating SQLite file report1.sqlite from report1.nsys-rep
Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/nvtx_sum.py]...
SKIPPED: report1.sqlite does not contain NV Tools Extension (NVTX) data.

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/osrt_sum.py]...

 ** OS Runtime Summary (osrt_sum):

 Time (%)  Total Time (ns)  Num Calls  Avg (ns)   Med (ns)   Min (ns)   Max (ns)   StdDev (ns)       Name
 --------  ---------------  ---------  ---------  ---------  --------  ----------  -----------  ---------------
     74.7      144,922,336      3,155   45,934.2    9,248.0       992   2,083,040    117,442.7  ioctl
     21.3       41,361,056        504   82,065.6    2,080.0       992  16,705,856    989,000.1  poll
      1.6        3,045,920        358    8,508.2    8,384.0     3,808      13,568      1,346.7  writev
      0.9        1,705,408        414    4,119.3    1,760.0       992     268,128     17,542.9  recvmsg
      0.5          928,768          3  309,589.3  311,008.0   306,656     311,104      2,540.8  usleep
      0.3          548,224        101    5,428.0    2,016.0     1,120      72,032     11,474.8  recv
      0.1          287,264        253    1,135.4    1,120.0       992       1,792        125.0  dup
      0.1          208,704         24    8,696.0    7,200.0     2,720      22,944      5,559.6  open
      0.1          205,888         24    8,578.7    6,640.0     2,880      22,720      5,051.5  mmap
      0.1          161,184         24    6,716.0    5,168.0     1,568      25,056      5,389.9  fopen
      0.1          158,784         12   13,232.0   10,688.0     7,360      33,952      7,412.9  socketpair
      0.1          112,192         14    8,013.7    7,904.0     3,104      12,768      2,821.3  sendmsg
      0.0           75,840         10    7,584.0    7,904.0     1,024      14,240      4,542.5  fread
      0.0           62,720         10    6,272.0    3,984.0     1,536      26,784      7,482.5  read
      0.0           50,144          4   12,536.0   12,112.0     1,536      24,384     10,571.6  connect
      0.0           49,216          4   12,304.0   12,624.0     6,080      17,888      4,960.6  socket
      0.0           44,000          3   14,666.7   13,280.0    12,512      18,208      3,090.8  fopen64
      0.0           34,400         11    3,127.3    2,880.0     2,016       5,824      1,055.8  fclose
      0.0           34,272          2   17,136.0   17,136.0     1,280      32,992     22,423.8  fgets
      0.0           34,272          2   17,136.0   17,136.0    11,424      22,848      8,078.0  munmap
      0.0           29,760          1   29,760.0   29,760.0    29,760      29,760          0.0  open64
      0.0           18,208         13    1,400.6    1,248.0     1,024       2,144        387.1  fcntl
      0.0           17,568          4    4,392.0    4,160.0     2,080       7,168      2,094.4  send
      0.0           12,384          1   12,384.0   12,384.0    12,384      12,384          0.0  posix_fallocate
      0.0            9,312          2    4,656.0    4,656.0     2,336       6,976      3,281.0  lockf
      0.0            7,040          1    7,040.0    7,040.0     7,040       7,040          0.0  ftruncate
      0.0            5,824          2    2,912.0    2,912.0     2,752       3,072        226.3  fstat64
      0.0            2,688          1    2,688.0    2,688.0     2,688       2,688          0.0  write

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/cuda_api_sum.py]...
SKIPPED: report1.sqlite does not contain CUDA trace data.

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/cuda_gpu_kern_sum.py]...
SKIPPED: report1.sqlite does not contain CUDA kernel data.

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/cuda_gpu_mem_time_sum.py]...
SKIPPED: report1.sqlite does not contain GPU memory data.

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/cuda_gpu_mem_size_sum.py]...
SKIPPED: report1.sqlite does not contain GPU memory data.

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/openmp_sum.py]...
SKIPPED: report1.sqlite does not contain OpenMP event data.

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/opengl_khr_range_sum.py]...
SKIPPED: report1.sqlite does not contain KHR Extension (KHR_DEBUG) data.

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/opengl_khr_gpu_range_sum.py]...
SKIPPED: report1.sqlite does not contain GPU KHR Extension (KHR_DEBUG) data.

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/vulkan_marker_sum.py]...
SKIPPED: report1.sqlite does not contain Vulkan Debug Extension (Vulkan Debug Util) data.

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/vulkan_gpu_marker_sum.py]...
SKIPPED: report1.sqlite does not contain GPU Vulkan Debug Extension (GPU Vulkan Debug markers) data.

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/dx11_pix_sum.py]...
SKIPPED: report1.sqlite does not contain DX11 CPU debug markers.

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/dx12_gpu_marker_sum.py]...
SKIPPED: report1.sqlite does not contain DX12 GPU debug markers.

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/dx12_pix_sum.py]...
SKIPPED: report1.sqlite does not contain DX12 CPU debug markers.

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/wddm_queue_sum.py]...
SKIPPED: report1.sqlite does not contain WDDM context data.

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/um_sum.py]...
SKIPPED: report1.sqlite does not contain CUDA memory transfers data.

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/um_total_sum.py]...
SKIPPED: report1.sqlite does not contain CUDA memory transfers data.

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/um_cpu_page_faults_sum.py]...
SKIPPED: report1.sqlite does not contain CUDA Unified Memory CPU page faults data.

Processing [report1.sqlite] with [/opt/nvidia/nsight-systems/2024.2.2/host-linux-armv8/reports/openacc_sum.py]...
SKIPPED: report1.sqlite does not contain OpenACC event data.


Hi,

Could you check the document below and enable the trace accordingly?
For example, nsys profile --trace=cuda,nvtx,nvmedia,opengl ...

https://docs.nvidia.com/nsight-systems/UserGuide/index.html

Thanks.

User Guide — nsight-systems 2024.6 documentation
does not appear to have any instructions for enabling EGL/GL profiling other than using " “nsys profile --trace …” with various parameters like those mentioned above.

However, I found that “nsys profile” is not the problem, it appears that it does record APIs such as glDrawArrays and eglSwapBuffers, and nsys-ui does show then, but “nsys stats” does not show them. But nsys-ui is nowhere near as useful as nsys stats. So, I need to find a way to run nsys stats to generate report that includes glDrawArrays and eglSwapBuffers.
I tried running “nsys stats --help-reports” and then tried all those reports in “nsys stats -r …”, but none of them appears to print glDrawArrays and eglSwapBuffers.

Is there a way to make report with glDrawArrays and eglSwapBuffers and other EGL/GL functions?

Hi,

You can find more help info with the below command:

$ nsys --help stats

The available report are :

The following built-in reports are available:

  cuda_api_gpu_sum[:nvtx-name][:base|:mangled] -- CUDA Summary (API/Kernels/MemOps)
  cuda_api_sum -- CUDA API Summary
  cuda_api_trace -- CUDA API Trace
  cuda_gpu_kern_gb_sum[:nvtx-name][:base|:mangled] -- CUDA GPU Kernel/Grid/Block Summary
  cuda_gpu_kern_sum[:nvtx-name][:base|:mangled] -- CUDA GPU Kernel Summary
  cuda_gpu_mem_size_sum -- CUDA GPU MemOps Summary (by Size)
  cuda_gpu_mem_time_sum -- CUDA GPU MemOps Summary (by Time)
  cuda_gpu_sum[:nvtx-name][:base|:mangled] -- CUDA GPU Summary (Kernels/MemOps)
  cuda_gpu_trace[:nvtx-name][:base|:mangled] -- CUDA GPU Trace
  cuda_kern_exec_sum[:nvtx-name][:base|:mangled] -- CUDA Kernel Launch & Exec Time Summary
  cuda_kern_exec_trace[:nvtx-name][:base|:mangled] -- CUDA Kernel Launch & Exec Time Trace
  dx11_pix_sum -- DX11 PIX Range Summary
  dx12_gpu_marker_sum -- DX12 GPU Command List PIX Ranges Summary
  dx12_pix_sum -- DX12 PIX Range Summary
  mpi_event_sum -- MPI Event Summary
  mpi_event_trace -- MPI Event Trace
  network_congestion[:ticks_threshold=<ticks_per_ms>] -- Network Devices Congestion
  nvtx_gpu_proj_sum -- NVTX GPU Projection Summary
  nvtx_gpu_proj_trace -- NVTX GPU Projection Trace
  nvtx_kern_sum[:base|:mangled] -- NVTX Range Kernel Summary
  nvtx_pushpop_sum -- NVTX Push/Pop Range Summary
  nvtx_pushpop_trace -- NVTX Push/Pop Range Trace
  nvtx_startend_sum -- NVTX Start/End Range Summary
  nvtx_sum -- NVTX Range Summary
  nvvideo_api_sum -- NvVideo API Summary
  openacc_sum -- OpenACC Summary
  opengl_khr_gpu_range_sum -- OpenGL KHR_debug GPU Range Summary
  opengl_khr_range_sum -- OpenGL KHR_debug Range Summary
  openmp_sum -- OpenMP Summary
  osrt_sum -- OS Runtime Summary
  syscall_sum -- Syscall Summary
  um_cpu_page_faults_sum -- Unified Memory CPU Page Faults Summary
  um_sum[:rows=<limit>] -- Unified Memory Analysis Summary
  um_total_sum -- Unified Memory Totals Summary
  vulkan_api_sum -- Vulkan API Summary
  vulkan_api_trace -- Vulkan API Trace
  vulkan_gpu_marker_sum -- Vulkan GPU Range Summary
  vulkan_marker_sum -- Vulkan Range Summary
  wddm_queue_sum -- WDDM Queue Utilization Summary

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.