Performance differences between 16G and 32G AGX Xavier

Why is there so more GPU usage using Xavier 32 Gb compared to 16 Gb?

We’ve developed an application that grabs 4 cameras using a Leopard Imaging board and Sony cameras. Our application was developed on 16G running JP 4.3. There everything works fine. However when we install the same software on a 32Gb Xavier, the solutions GPU usage goes up and the system needs much more power (47.5 to>60). The relevant tegrastats outputs are attached to this message. A reasonable example of the most important parts of both stats is:

  • stats16G.log (53.5 KB)
    RAM 6660/15823MB (lfb 2168x4MB) EMC_FREQ 78%@2133 GPU 8945/8945 CPU 2577/2577 SOC 13948/13948 VDDRQ 3485/3485 SYS5V 3796/3796
  • stats32G.log (53.7 KB)
    RAM 6712/31927MB (lfb 6183x4MB) EMC_FREQ 82%@2133 GPU 9621/9621 CPU 3457/3457 SOC 14426/14426 VDDRQ 4205/4205 SYS5V 3947/3947

The attached stats have the full output.

This structural difference in GPU of about 800-1000 points and similarly increased power usage worries us. In addition to that, we run into issues with JP4.3 and 4.4 on 32GB systems. Where the 16GB system runs for long periods of time without issue, the 32GB variant crashes and subsequently reboots after running for 5-23 minutes.

So to summarize we have a couple of questions:

  • Why does the 32GB version use so much more GPU?
  • Does the increased GPU usage account for all the increased power usage, or are there more factors in play?
  • What can cause the crashes/reboots that we encounter on a 32GB Xavier, but not on a 16GB version?

Hi,
The GPU loading should be

GR3D_FREQ 61%@1377

Don’t see much difference in attached 16G and 32G log. It may help to use CUDA tool:
https://developer.nvidia.com/embedded/develop/tools
NVIDIA Visual Profiler and nvprof

Please give it a try and see if there is more clues.

Hi,

If the GPU load is signified by what seems to be a frequency, what does the GPU field mean exactly?

I extracted the GR3D_FREQ field from these (admittedly short) logs for plotting, and got:


So from this graph it seems obvious that the GPU usage on the 32G version is different. Me profiling the application, while I already provided you with the information that we have the exact same software running on both machines, seems pointless.

I can produce longer measurements with higher report frequencies, and I can produce graphs with running means, if any of that would help.

Thanks.

Hi,
Do you see better performance in using Xavier 32G? If GPU loading is a bit higher, there should be slightly better performance.

I would have to figure out a way to reliably measure that performance has in fact increased. What I do see on the 32G devices is performance degrading over time and subsequently parts of the device or software giving up. There seems to be a relation, but I can’t put my finger to it yet.

Hi,
We have a tool Jetson Power Estimator
Please give it a try and share the result. Also there are reference samples:

/usr/src/nvidia/graphics_demos

Since we don’t have your application, if either reference sample can be run to replicate the issue, please let us know and we can try to replicate it.

We’ve identified the root cause of our issues. Here’s a statement from our CEO on it.

We have found the cause of the problem. Before we share it with you, we need to have a meeting with NVIDIA. They introduced something that kills performance, generates more power usage and as a result more heat. Also our report down here describing our problems and differences between 16 and 32GB Xaviers should have immediately alarmed people within NVIDIA who know what they made. They are obviously not active on this forum which is a shame. Due to this “feature” by NVIDIA, that unexpectedly showed up unfavourably in the 32GB Xaviers, we had to undertake a massive operation with our customers which did cost a lot of money and actually led to customers not want to work with us anymore.

We do understand NVIDIA is not liable, but Xaviers are not free of charge and should go with better support. So as soon as we have been in contact with NVIDIA and receive a satisfactory answer we will update this thread.

Hi,
We would like to have some information. Could you please share which system software components you found the issue? So that we can have the teams/experts to have initial investigation. If it is fine, please share the information, thanks.