Lower Performance of NvTransform in JetPack 5 compared to NvBufferTransform in JetPack 4.6

romilaggarwal611 · October 20, 2023, 2:51am

Hi,
I recently encountered some performance issues while migrating from JetPack 4.6.x to JetPack 5, specifically related to the NvTransform function. In my use case, I need to convert a JPEG image captured from a sensor to an RGB image. I utilize the NvTransform function to convert a 3120x3120 resolution YUV image (obtained after using NvJPEGDecoder::decodeToFd) into BGRA format. On JetPack 4.6, this operation took approximately 13ms, but on JetPack 5, it now takes over 23ms.

Parameters in JetPack 4.6

                NvBufferCreateParams output_params;
                output_params.layout = NvBufferLayout_Pitch;
                output_params.payloadType = NvBufferPayload_SurfArray;
                output_params.colorFormat = NvBufferColorFormat_ARGB32;
                output_params.nvbuf_tag = NvBufferTag_VIDEO_CONVERT;

Parameters in JetPack 5

                NvBufSurf::NvCommonAllocateParams output_params;
                output_params.layout = NVBUF_LAYOUT_PITCH;
                output_params.memType = NVBUF_MEM_SURFACE_ARRAY;
                output_params.colorFormat = NVBUF_COLOR_FORMAT_BGRA;
                output_params.memtag = NvBufSurfaceTag_VIDEO_CONVERT;

In an effort to resolve this issue, I tried a few experiments:

Directly use the YUV image
I attempted to use mmap+memcpy or NvBufSurface2Raw to copy the data from the DMA BUF returned by decodeToFd (and use the YUV image instead of BGR). Surprisingly, this operation took approximately 37ms, significantly longer than the time it takes to copy the data returned from NvTransform, which is roughly 7ms. This was unexpected, considering that YUV has 2 bytes per pixel while BGRA format uses 4 bytes per pixel.
I speculated that this might be due to the memType and layout settings for the Transformer. To test this theory, I made modifications to the NvJPEGDecoder class so that the NvBufSurface used in decodeToFd is allocated with the same parameters. I added the following function and called it during the initialization procedure:

int NvJPEGDecoder::allocate_buf()
{
    NvBufSurfaceAllocateParams params = {{0}};
    params.params.width = 3120;
    params.params.height = 3120;
    params.params.layout = NVBUF_LAYOUT_PITCH;
    params.params.memType = NVBUF_MEM_SURFACE_ARRAY;
    params.params.colorFormat = NVBUF_COLOR_FORMAT_YUV420;
    params.memtag = NvBufSurfaceTag_VIDEO_CONVERT;
    
    return NvBufSurfaceAllocate(&surface, 1, &params);
}

(other changes like adding NvBufSurface *surface = 0; and the above function to the header file, and modifying cinfo.pVendor_buf = (unsigned char*)surface; in the decodeToFd function were also handled)
However, even after this change, the time to copy over the data after decodeToFd remained the same.

Clock Frequency Adjustment:
I also explored the option of modifying the clock frequency for VIC, as mentioned in the docs. I used CLI commands to set the VIC clock frequency to its maximum available value, which reduced the computation time of NvTransform from 23ms to 4-5ms.

So I have a few questions:

Is there a way to improve the performance of NvTransform?
Why does copying over data returned from decodeToFd take longer time, and is there a way to reduce it?
Is it safe to permanently increase the clock frequency of VIC( max_frequency or close enough, say at 90%) and if so then is there a C++ API that I could use to do so?

Please note that I am using the 06_jpeg_decode sample provided in jetson_multimedia_api.

I’ll greatly appreciate any help you could provide. Thanks in advance!

DaneLLL · October 20, 2023, 3:17am

Hi,
Please try this script to enable VIC at maximum clock. You can run sudo tegrastats to confirm if clock of VIC engine is fixed at maximum. This shall enable NvBufSurfTransform() at maximum throughput.
VPI - Vision Programming Interface: Performance Benchmark

romilaggarwal611 · October 20, 2023, 11:36am

Hi, I think you missed to attach the script.
But, I was able to set the max clock frequency for vic and improve the performance. My question was whether this is safe to do permanently and whether there is some other solution because this solution would require sudo access.

DaneLLL · October 20, 2023, 11:48am

Hi,
For enabling it at booting, please refer to
Camera's frame rate unstable - #24 by DaneLLL

romilaggarwal611 · October 23, 2023, 9:09am

Thanks for the reply.
Could you please confirm whether it is safe to set max frequency for vic (permanently or for long durations in the order of a days).
Also, I would really appreciate if you could answer the second question

DaneLLL · October 23, 2023, 10:24am

Hi,
It is fine to run VIC at maximum clock. Dynamic frequency scaling is added for saving power. If you use the engine heavily in the use-case, it’s better to run it at max clock. To get maximum throughput.

For performance comparison, please run Xavier NX in 20W mode, executesudo jetson_clocks, and enable VIC at maximum clock. To check if you still observe deviation in the identical environment.

system · November 8, 2023, 5:53am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.