There are GPU performance regressions in the 32.4.2 and newer userspace gpu drivers (shown here with a vulkan benchmark suite).
I have tested the 32.3.1, 32.4.2. 32.4.4, and 32.5.1 package releases by starting with a 32.3.1 jetpack release and upgrading incrementally by pinning the packages, changing the nvidia-l4t-apt-source.list when necessary and then benchmarking each version.
For the ease of representation and reproduction, I used an open benchmark suite to generate my data GitHub - RippeR37/GL_vs_VK: Comparison of OpenGL and Vulkan API in terms of performance.
Attached you will find my source download and compiled binary. I have edited the source slightly to result in a GPU bottleneck on the TX1. Extract the source attached to your users home folder and run the vk_bench_table.sh shell script (found in the bin folder) to generate tables of data for benchmark 1 and 3 for the full frequency range of the GPU.
The script pins the CPU to the maximum allowed frequency and sets the RAM frequency to 1600MHz.
https://drive.google.com/file/d/1N4orvxpx34JRRJXgWCgeiQ3jY8YKEcBc/view?usp=sharing
Below is a description of the two benchmarks taken from the readme:
Test #1 - static scene
This test resolves around single static scene with variable number of rendered objects which quality can be chosen (each is a sphere with specific ammount of vertices).
Number of vertices, number of vertices and update work is customizable to give possibility to emulate different ammount of CPU and GPU work (this gives us an opportunity to test CPU-bound and GPU-bound scenarios).
Test #3 - shadow mapping
In this test we render a “checkboard” floor with differently-colored cubes and above that we render one high-res sphere and many cubes in different positions. We render it in two passes - depth pass from light PoV to acquire shadowmap and then real render pass which simply renderes scene shadowing necessary fragments.