Hi,
I recently encountered some performance issues while migrating from JetPack 4.6.x to JetPack 5, specifically related to the NvTransform function. In my use case, I need to convert a JPEG image captured from a sensor to an RGB image. I utilize the NvTransform function to convert a 3120x3120 resolution YUV image (obtained after using NvJPEGDecoder::decodeToFd) into BGRA format. On JetPack 4.6, this operation took approximately 13ms, but on JetPack 5, it now takes over 23ms.
Parameters in JetPack 4.6
NvBufferCreateParams output_params;
output_params.layout = NvBufferLayout_Pitch;
output_params.payloadType = NvBufferPayload_SurfArray;
output_params.colorFormat = NvBufferColorFormat_ARGB32;
output_params.nvbuf_tag = NvBufferTag_VIDEO_CONVERT;
Parameters in JetPack 5
NvBufSurf::NvCommonAllocateParams output_params;
output_params.layout = NVBUF_LAYOUT_PITCH;
output_params.memType = NVBUF_MEM_SURFACE_ARRAY;
output_params.colorFormat = NVBUF_COLOR_FORMAT_BGRA;
output_params.memtag = NvBufSurfaceTag_VIDEO_CONVERT;
In an effort to resolve this issue, I tried a few experiments:
- Directly use the YUV image
I attempted to use mmap+memcpy or NvBufSurface2Raw to copy the data from the DMA BUF returned by decodeToFd (and use the YUV image instead of BGR). Surprisingly, this operation took approximately 37ms, significantly longer than the time it takes to copy the data returned from NvTransform, which is roughly 7ms. This was unexpected, considering that YUV has 2 bytes per pixel while BGRA format uses 4 bytes per pixel.
I speculated that this might be due to the memType and layout settings for the Transformer. To test this theory, I made modifications to the NvJPEGDecoder class so that the NvBufSurface used in decodeToFd is allocated with the same parameters. I added the following function and called it during the initialization procedure:
int NvJPEGDecoder::allocate_buf()
{
NvBufSurfaceAllocateParams params = {{0}};
params.params.width = 3120;
params.params.height = 3120;
params.params.layout = NVBUF_LAYOUT_PITCH;
params.params.memType = NVBUF_MEM_SURFACE_ARRAY;
params.params.colorFormat = NVBUF_COLOR_FORMAT_YUV420;
params.memtag = NvBufSurfaceTag_VIDEO_CONVERT;
return NvBufSurfaceAllocate(&surface, 1, ¶ms);
}
(other changes like adding NvBufSurface *surface = 0;
and the above function to the header file, and modifying cinfo.pVendor_buf = (unsigned char*)surface;
in the decodeToFd function were also handled)
However, even after this change, the time to copy over the data after decodeToFd remained the same.
- Clock Frequency Adjustment:
I also explored the option of modifying the clock frequency for VIC, as mentioned in the docs. I used CLI commands to set the VIC clock frequency to its maximum available value, which reduced the computation time of NvTransform from 23ms to 4-5ms.
So I have a few questions:
- Is there a way to improve the performance of NvTransform?
- Why does copying over data returned from decodeToFd take longer time, and is there a way to reduce it?
- Is it safe to permanently increase the clock frequency of VIC( max_frequency or close enough, say at 90%) and if so then is there a C++ API that I could use to do so?
Please note that I am using the 06_jpeg_decode sample provided in jetson_multimedia_api.
I’ll greatly appreciate any help you could provide. Thanks in advance!