About GPU accelerated image encode on T4

Hi all,

I’m currently working on encoding some result images from deepstream and send them to other applications. I’m trying to use GPU (in my case, T4) to accelerate the encoding process and reduce CPU cost, and after some searching I find NvJpeg.

But after some working around with NvJpeg, I found that NvJpeg seems to use CUDA to do the encoding, so is there a way to use the NvENC chip inside the T4 to do the encoding?

Thanks!

nvenc/nvdec hardware is for encoding/decoding video, not images. I’m not aware of any way to use the motion video hardware to encode still images. Certainly NVIDIA doesn’t provide any way/libraries/apis to do that. If there were, and it were useful, there would be no reason to introduce NVJPEG hardware and NVJPEG as a separate engine (in Ampere).

Hi Robert,

Thanks for your quick reply! Such a pity that we can’t do jpeg encoding with nvenc hardware, any plan to implement this?

I have one more question with nvjpeg. As I mentioned earlier, what I try to do is to use nvjpeg to do jpeg encoding after I recieve results from deepstream infer, but the encoding speed increases significantly.

When I use nvjpeg standalone, the encoding speed is around 10ms, but when I try to cascade it after deepstream inference, the encoding speed increases to 200ms (GPU utilization rate is around 90%).

Any thoughts on how to solve this?

Thanks!

It’s either a measurement error or something else is going on at the same time as the encoding.
The profiler (nsight systems)is your friend.

I think its quite unlikely. In any event I’m not allowed to make definitive forward-looking statements.

the test result:
the image :1920*1080 ,RGB,
the time : cudamemcpy, nvjpegencode ,gpu memory download
test way:repeat 1000 times

Separately gpu:
one nvjpegEncode process:19ms , GPU-Util 25%
ten nvjpegEncode process:50.28ms,GPU-Util 89%
twenty nvjpegEncode process :104 ms,GPU-Util 99%

Share GPU: Deepstream + nvJpegEncode:
deepstream 22 streams , 12fps, GPU-Util >80%
one nvJpegEncode :32784ms/1000=32.784ms
two nvJpegEncode : (49203+49292)ms/2000=49.245ms
ten nvJpegEncode : 102.ms

we found the time of nvJpegEncode is increase along with GPU-Util.

we want to add another device to do jpeg encode, what do you recommend?

Hi Robert,

Thanks for your reply! We did some further testing and my colleage posted the result above. As you can see, there’s huge impact to nvJpeg while DS is running.

We plan to do some further testing using profiler, and meanwhile, is there any professional encoding device/chip we can use?

You might be able to achieve what you want using multiple GPUs. I’m not aware of any professional standalone encoding device/chip, but I imagine they exist.