NVJPEG decode is slow

Decoding this FHD jpeg with jetson_multimedia_api/samples/06_jpeg_decode/jpeg_decode I get the following timings on my Jetson Orin Nano:

----------- Element = jpegdec -----------
Total units processed = 300
Average latency(usec) = 87793
Minimum latency(usec) = 53701
Maximum latency(usec) = 93974

Running two streams in parallel gives slightly slower results:

----------- Element = jpegdec -----------
Total units processed = 300
Average latency(usec) = 89832
Minimum latency(usec) = 65503
Maximum latency(usec) = 93542

And same for ten streams:

----------- Element = jpegdec -----------
Total units processed = 3000
Average latency(usec) = 90901
Minimum latency(usec) = 66499
Maximum latency(usec) = 96181

Even stress-testing it to 100 repeats, the speed of the decoder is stuck at 115MHz, which is much lower than the 499.2 MHz quoted in the reference

The solution in a different thread was to upgrade Jetpack, but I am already running 5.1.3, and my performance is more than six times worse.

NV Power is set to 15 W, jetson clocks is running, and no other remarkable processes are running. Is there anything else I am missing?

# One stream
./jpeg_decode num_files 1 ~/Definitions_of_TV_standards.jpg --perf

# Two
./jpeg_decode num_files 2 ~/Definitions_of_TV_standards.jpg /dev/null  ~/Definitions_of_TV_standards.jpg /dev/null --perf

# Ten
./jpeg_decode num_files 10 ~/Definitions_of_TV_standards.jpg /dev/null  ~/Definitions_of_TV_standards.jpg /dev/null  ~/Definitions_of_TV_standards.jpg /dev/null  ~/Definitions_of_TV_standards.jpg /dev/null  ~/Definitions_of_TV_standards.jpg /dev/null  ~/Definitions_of_TV_standards.jpg /dev/null  ~/Definitions_of_TV_standards.jpg /dev/null  ~/Definitions_of_TV_standards.jpg /dev/null  ~/Definitions_of_TV_standards.jpg /dev/null  ~/Definitions_of_TV_standards.jpg /dev/null  --perf

We run on Orin Nano developer kit with Jetpack 5.1.2 and get the result:

----------- Element = jpegdec -----------
Total units processed = 300
Average latency(usec) = 10210
Minimum latency(usec) = 10162
Maximum latency(usec) = 18262

We will try 5.1.3 and update. One question, do you run it on Orin Nano developer kit with monitor connected?

It is a devkit, yes. I get the same results with and without the monitor connected, but X is running in both cases. All other hardware engines are off, except for the Security Engine, including NVJPG and NVJPG1 when the process ends.

Hi david.mh,

Share our test result with JP-5.1.3 on Orin-Nano:

----------- Element = jpegdec -----------
Total units processed = 300
Average latency(usec) = 56564
Minimum latency(usec) = 30705
Maximum latency(usec) = 58190

What could be the difference? Same code, same system.

Do you use developer kit with default Jetpack 5.1.3 image? Would like to check if there is deviation in the test environment.

Yes, a freshly installed image.

Could it be related to this?

Please apply this:
NvJPEGDecoder generates the same output if called twice with different input buffer - #7 by DaneLLL

And do profiling with the JPEG file:

gst-launch-1.0 videotestsrc num-buffers=1 ! video/x-raw,width=1920,height=1080 ! jpegenc ! filesink location=test1.jpg

It seems like Definitions_of_TV_standards.jpg is more complex and decoding it takes more time than JPEGs compressed from YUV420. Please try other JPEG files.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.