glReadPixels preformance on JP5.1.2

Hi,

I am porting our xavier NX application from JP4.6 to JP5.1.2,
Migrating from nvbuf_utils to nvbuf_surface according to guide and it is going very well.

I am using the exact same code for NvEglRenderer.cpp for JP4.6 and JP5.1.2 appart from the obvious buffer mapping functions that have been changed.
I get the exact same output result from the GPU in both JP4.6 & JP5.1.2 which is very good, my code does some GPU work and then I use glReadPixels to get the result back from the GPU.
here is the major diffrance:
On JP4.6 glReadPixels takes about 5 ms to compleate
On JP5.1.2 glReadPixels takes about 11 ms to compleate

I have verified the following:

  • nvpmodel on both platforms is set to MODE_15W_6CORE
  • verified max freq using jetson_clocks --show
    root@ubuntu:/etc# jetson_clocks --show
    SOC family:tegra194 Machine:NVIDIA Jetson Xavier NX Developer Kit
    Online CPUs: 0-5
    cpu0: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
    cpu1: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
    cpu2: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
    cpu3: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
    cpu4: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
    cpu5: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
    GPU MinFreq=1109250000 MaxFreq=1109250000 CurrentFreq=1109250000
    EMC MinFreq=204000000 MaxFreq=1600000000 CurrentFreq=1600000000 FreqOverride=1
    DLA0_CORE: Online=1 MinFreq=0 MaxFreq=1100800000 CurrentFreq=1100800000
    DLA0_FALCON: Online=1 MinFreq=0 MaxFreq=640000000 CurrentFreq=640000000
    DLA1_CORE: Online=1 MinFreq=0 MaxFreq=1100800000 CurrentFreq=1100800000
    DLA1_FALCON: Online=1 MinFreq=0 MaxFreq=640000000 CurrentFreq=640000000
    PVA0_VPS0: Online=1 MinFreq=0 MaxFreq=819200000 CurrentFreq=819200000
    PVA0_VPS1: Online=1 MinFreq=0 MaxFreq=819200000 CurrentFreq=819200000
    PVA0_AXI: Online=1 MinFreq=0 MaxFreq=601600000 CurrentFreq=601600000
    PVA1_VPS0: Online=1 MinFreq=0 MaxFreq=819200000 CurrentFreq=819200000
    PVA1_VPS1: Online=1 MinFreq=0 MaxFreq=819200000 CurrentFreq=819200000
    PVA1_AXI: Online=1 MinFreq=0 MaxFreq=601600000 CurrentFreq=601600000
    CVNAS MinFreq=0 MaxFreq=576000000 CurrentFreq=576000000
    FAN Dynamic Speed control=active hwmon4_pwm1=0
    NV Power Mode: MODE_15W_6CORE

My Application uses glReadPixels twice and it must process 50 frames per second, glReadPixels currently blocks me from achiving 50 frames per second since 2 glReadPixels takes about 22 ms.
In JP4.6 the 2 glReadPixels takes about 11 ms to complete.

My current workaroud was to use MAXN with nvpmodel boosting the CPUS to 1.9 Ghz,
This helped me achive 16ms for 2 glReadPixels, but still i dont see obvious reasons for the preformace degregation.

I also would like to add that in order to rule out memory and CPU issues that my platforms might have I have tested CPU/Memory preformace using my own code and also sysbench on both 4.6 & 5.1.2
memory preformace and CPU preformace are about the same using MODE_15W_6CORE power configuration for both JP4.6 & JP5.1.2

My questions are:

  1. Do you know of such an issue with glReadPixels, do you have some idea on how to fix the this issue.
  2. is it OK to use MAXN power configuration for the device ? keeping in mind that we a very good cooling solution in our hardware (on JP4.6 using MODE_15W_6CORE, 6 CPUS are at 400% and GPU is maxed doing AI we only get about 45 - 50 celsius on CPUS & GPU)

Thanks
Amir

Hi,
Please share a test sample so that we can run on 4.6.4 and 5.1.2 to compare performance of glReadPixels() on Xavier NX developer kit.

Hi,

Thanks for the reply,
I will prepare the glReadPixel test,
meanwhile what about my second question, MAXN, is it safe to use ?

Hi,
Yes, if you don’t frequently trigger overcurrent in running the use-case, it looks fine to use the mode directly. In MAXN mode, GPU and CPU are in maximum capability, so in certain use-cases overcurrent is triggered. In the condition, would need to customize a power mode based on the use-case.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.