Why nvjpegdec is much slower than jpegdec?

Please provide complete information as applicable to your setup.

• Hardware Platform (A100)
*• DeepStream 7.1
• NVIDIA GPU Driver Version: 560.28.03 CUDA Version:12.6
• questions

I tried using nvjpegdec to decode JPEG images for inference, but I found that the decoding speed with nvjpegdec is much slower than with jpegdec (although jpegdec uses more CPU resources). Is this reasonable, or is there something wrong with my implementation?

CUDA_VISIBLE_DEVICES=4 USE_NEW_NVSTREAMMUX=yes gst-launch-1.0 filesrc location='3840_2160.jpeg.1000' ! nvjpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_0 nvstreammux name=m batch-size=16 config-file-path='./stream_mux_config.txt' ! nvinfer config-file-path='dstest1_pgie_config.txt' ! fakesink enable-last-sample=False filesrc location= '3840_2160.jpeg.1000' ! nvjpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_1 filesrc location= '3840_2160.jpeg.1000' ! nvjpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_2 filesrc location= '3840_2160.jpeg.1000' ! nvjpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_3 filesrc location= '3840_2160.jpeg.1000' ! nvjpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_4 filesrc location= '3840_2160.jpeg.1000' ! nvjpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_5 filesrc location= '3840_2160.jpeg.1000' ! nvjpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_6 filesrc location= '3840_2160.jpeg.1000' ! nvjpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_7 filesrc location= '3840_2160.jpeg.1000' ! nvjpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_8 filesrc location= '3840_2160.jpeg.1000' ! nvjpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_9 filesrc location= '3840_2160.jpeg.1000' ! nvjpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_10 filesrc location= '3840_2160.jpeg.1000' ! nvjpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_11 filesrc location= '3840_2160.jpeg.1000' ! nvjpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_12 filesrc location= '3840_2160.jpeg.1000' ! nvjpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_13 filesrc location= '3840_2160.jpeg.1000' ! nvjpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_14 filesrc location= '3840_2160.jpeg.1000' ! nvjpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_15

Execution ended after 0:04:45.499731357
top command show %CPU about: 90%

CUDA_VISIBLE_DEVICES=4 USE_NEW_NVSTREAMMUX=yes gst-launch-1.0 filesrc location='3840_2160.jpeg.1000' ! jpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_0 nvstreammux name=m batch-size=16 config-file-path='./stream_mux_config.txt' ! nvinfer config-file-path='dstest1_pgie_config.txt' ! fakesink enable-last-sample=False filesrc location= '3840_2160.jpeg.1000' ! jpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_1 filesrc location= '3840_2160.jpeg.1000' ! jpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_2 filesrc location= '3840_2160.jpeg.1000' ! jpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_3 filesrc location= '3840_2160.jpeg.1000' ! jpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_4 filesrc location= '3840_2160.jpeg.1000' ! jpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_5 filesrc location= '3840_2160.jpeg.1000' ! jpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_6 filesrc location= '3840_2160.jpeg.1000' ! jpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_7 filesrc location= '3840_2160.jpeg.1000' ! jpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_8 filesrc location= '3840_2160.jpeg.1000' ! jpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_9 filesrc location= '3840_2160.jpeg.1000' ! jpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_10 filesrc location= '3840_2160.jpeg.1000' ! jpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_11 filesrc location= '3840_2160.jpeg.1000' ! jpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_12 filesrc location= '3840_2160.jpeg.1000' ! jpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_13 filesrc location= '3840_2160.jpeg.1000' ! jpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_14 filesrc location= '3840_2160.jpeg.1000' ! jpegdec ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! m.sink_15

Execution ended after 0:00:53.613020919
top command show %CPU about: 1300%

I repeated a 3840x2160.jpeg image 1000 times and saved it as 3840x2160.jpeg.1000 to simulate a continuous stream of image data input.

The purpose of using hardware decoder is to offload the CPUs on some computationally intensive tasks and the hardware decoder can share hardware video buffer directly with GPU for inferencing, encoding and other GPU operations without converting CPU buffer to GPU buffer. Why do you think the single GPU should be faster than 13 CPUs for decoding? What is the purpose of comparing the decoding speed between GPU and CPU?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.