How to read tegrastats gpu utilisation values

How should we be reading the gr3d_freq values provided in tegrastats (or utilities that sit on top of tegrastats like jtop)?

For example tegra stats will report values like this:

RAM 2224/3964MB (lfb 52x4MB) SWAP 85/4030MB (cached 6MB) CPU [39%@1479,41%@1479,28%@1479,31%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@34C CPU@38.5C PMIC@100C GPU@33.5C AO@42C thermal@36.25C POM_5V_IN 4912/4912 POM_5V_GPU 554/554 POM_5V_CPU 1344/1344
RAM 2224/3964MB (lfb 52x4MB) SWAP 85/4030MB (cached 6MB) CPU [39%@1479,44%@1479,39%@1479,33%@1479] EMC_FREQ 0% GR3D_FREQ 81% PLL@34C CPU@38.5C PMIC@100C GPU@34.5C AO@44C thermal@36.25C POM_5V_IN 6120/5516 POM_5V_GPU 1934/1244 POM_5V_CPU 985/1164
RAM 2224/3964MB (lfb 52x4MB) SWAP 85/4030MB (cached 6MB) CPU [32%@1479,32%@1479,36%@1479,30%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@34C CPU@38C PMIC@100C GPU@33.5C AO@42C thermal@36.25C POM_5V_IN 4491/5174 POM_5V_GPU 317/935 POM_5V_CPU 993/1107
RAM 2224/3964MB (lfb 52x4MB) SWAP 85/4030MB (cached 6MB) CPU [41%@1479,41%@1479,38%@1479,32%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@34C CPU@38.5C PMIC@100C GPU@33.5C AO@42C thermal@35.75C POM_5V_IN 4960/5120 POM_5V_GPU 317/780 POM_5V_CPU 1228/1137
RAM 2224/3964MB (lfb 52x4MB) SWAP 85/4030MB (cached 6MB) CPU [37%@1479,39%@1479,35%@1479,29%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@34C CPU@38.5C PMIC@100C GPU@34C AO@42.5C thermal@36.5C POM_5V_IN 5181/5132 POM_5V_GPU 553/735 POM_5V_CPU 1105/1131

So above we have GR3D_FREQ values of 0%, 81%, 0%, 0%, 0%.

This is while running the standard resnet10 model on 3 rtsp streams. To plot this data using jtop looks like this (not the same time values):

So what is this telling me? How do we interpret this data? Because the GPU spikes up to 99% every 2 to 8 seconds does that mean it is running at 99% utilisation? If we added more streams would the gpu inference time then slow down to more than what can be processing in the time of one frame?

Or should we be time averaging these GR3D_FREQ value to determine the utilisation. So considering in that chart there is large sections of time where the GPU is at 0% - the average will be about 40-50%.??

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) nano/nx
• DeepStream Version 5.0GA

Hi,
It shows the runtime status in running tegrastats. You may set small interval and average the result:

$ sudo tegrastats --interval 100

Or you can try NVIDIA Visual Profiler and nvprof listed in

Are you saying that the “average” over time is the GPU utilisation…?

So given that my example graph had a lot of time at 0%, the GPU was hardly working hard at all?

Hi,
Looks like you don’t run sudo tegrastats. GPU frequency is absent. It should be printed like:

RAM 1410/3956MB (lfb 133x4MB) SWAP 402/1978MB (cached 9MB) IRAM 0/252kB(lfb 252kB) CPU [1%@1479,0%@1479,0%@1479,0%@1479] EMC_FREQ 1%@1600 GR3D_FREQ 0%@921 APE 25 PLL@22.5C CPU@27.5C PMIC@100C GPU@23.5C AO@30C thermal@25.25C POM_5V_IN 2266/2266 POM_5V_GPU 123/123 POM_5V_CPU 288/288

Thats fine - it was just an example for this post so I quickly tried to get a screen shot.

The question still remains - how do we interpret the numbers. Do we average them over 10 seconds? to get the GPU utilisation. What is a high utilisation that would tell me I need to increase the inference interval parameter?

Hi,
Since GPU runs in dynamic frequency scaling, suggest you check loading percentage with frequency.

The shown information is the moment tegrastats fetches data, so you may set a short interval and accumulate/average the loading. For setting 100ms(–interval 100), the rough loading of 1 second is (loading_1 + loading_2 + … + loading_10)/10. Please run with sudo jetson_clocks so that the GPU frequency is fixed at max clock.

If you want you can change the frequency as well with jtop, running:

jtop -r 100
equivalent to
sudo tegrastats --interval 100

1 Like

Thanks @DaneLLL and @rbonghi,

Thats helpful info… So the best way to determine the GPU utilisation (after you are already running jetson_clocks) is to average the values from tegrastats (or jtop) over each second by running the data fetch every 100msec for example.

So the percentage I get from that is the average GPU loading.

So even if a lot of the tegrastats data fetches return 99% but every now and then its 0% that pulls the average way down.

Am I to assume that until this average over each second is above 90% the GPU is not really fully loaded?

I guess the other way to tell would be the latency or time the pgie takes to do inference - if its getting too big as a fraction of 1/fps then the GPU is working hard and you need to take measures like increasing the pgie interval…

@DaneLLL - I do not understand what you mean by this comment: “Since GPU runs in dynamic frequency scaling, suggest you check loading percentage with frequency.”

Hi,

Without executing sudo jetson_clocks, GPU is at 76MHz in idle state:

RAM 1418/3956MB (lfb 133x4MB) SWAP 397/1978MB (cached 10MB) IRAM 0/252kB(lfb 252kB) CPU [3%@102,9%@204,0%@204,2%@204] EMC_FREQ 1%@1600 GR3D_FREQ 0%@76 APE 25 PLL@22.5C CPU@27C PMIC@100C GPU@23.5C AO@30.5C thermal@25.25C POM_5V_IN 2022/2022 POM_5V_GPU 41/41 POM_5V_CPU 165/165

While running a application, you will see the frequency varying:

RAM 1428/3956MB (lfb 133x4MB) SWAP 397/1978MB (cached 10MB) IRAM 0/252kB(lfb 252kB) CPU [61%@1479,34%@1479,20%@1479,23%@1479] EMC_FREQ 8%@1600 GR3D_FREQ 28%@230 APE 25 PLL@24.5C CPU@30C PMIC@100C GPU@25C AO@32.5C thermal@27.5C POM_5V_IN 3557/2735 POM_5V_GPU 163/73 POM_5V_CPU 1226/647
RAM 1428/3956MB (lfb 133x4MB) SWAP 397/1978MB (cached 10MB) IRAM 0/252kB(lfb 252kB) CPU [59%@1479,29%@1479,35%@1479,12%@1479] EMC_FREQ 9%@1600 GR3D_FREQ 30%@230 APE 25 PLL@24.5C CPU@29.5C PMIC@100C GPU@25C AO@32C thermal@27.5C POM_5V_IN 3557/2760 POM_5V_GPU 163/76 POM_5V_CPU 1185/663
RAM 1428/3956MB (lfb 133x4MB) SWAP 397/1978MB (cached 10MB) IRAM 0/252kB(lfb 252kB) CPU [49%@1479,44%@1479,18%@1479,28%@1479] EMC_FREQ 10%@1600 GR3D_FREQ 41%@153 APE 25 PLL@24.5C CPU@29.5C PMIC@100C GPU@25C AO@32.5C thermal@28C POM_5V_IN 3639/2786 POM_5V_GPU 204/80 POM_5V_CPU 1226/680
RAM 1428/3956MB (lfb 133x4MB) SWAP 397/1978MB (cached 10MB) IRAM 0/252kB(lfb 252kB) CPU [31%@1479,15%@1479,47%@1479,42%@1479] EMC_FREQ 11%@1600 GR3D_FREQ 46%@153 APE 25 PLL@24.5C CPU@29.5C PMIC@100C GPU@25C AO@32C thermal@27C POM_5V_IN 3598/2809 POM_5V_GPU 204/83 POM_5V_CPU 1185/694

30%@230 means 30% loading at 230MHz and 41%@153 means 41% loading at 153MHz.

right. Thanks for that. I’m always running under jetson_clocks at fullspeed.