Convert to JPEG from YUV420 CUDA device memory

I have a YUV420 frame in 3 CUDA buffers (one for each plane) as allocated by nppiMalloc_8u_C1 after converting from another image format via this library NVIDIA 2D Image And Signal Performance Primitives (NPP): Color and Sampling Conversion. I want to convert that to JPEG format (without saving the JPEG to a file). This would be easy using the NvJPEG libary however, as per nvjpeg.h is missing on TX2, NvJPEG is not available on the TX2. I could use the NvJpegEncoder class however that seems to be geared towards converting one DMA buffer to another DMA buffer and I think that would be very inefficient considering I already have the image in CUDA memory and do not otherwise need to copy the data into a DMA buffer. Is there some other way to do this? Am I missing something?

Sorry for late reply.
Your proposal looks to be an optimal solution:

npp buffer(packed RGB) -> NvBuffer in RGBA -> NvBufferTransform() -> NvBuffer in NV12 or YUV420 -> NvJPEGEncoder

Please refer to samples in /usr/src/jetson_multimedia. You can check cuda_postprocess() in 12_camera_v4l2_cuda, for getting CUDA pointer. And 05_jpeg_encode for jpeg encoding.

Thank you. Can I treat the NvBuffer as CUDA memory and call nppi functions on it directly or do I have to copy out of the NvBuffer and into memory allocated by cudaMalloc?

NvBuffer can be treated as a CUDA memory but it looks like RGBA is not supported in NPP functions. So may need one more buffer for re-sampling RGB to RGBA.

Are both NPP and NvBuffer CUDA accelerated?


Yes. NPP is CUDA accelerated. NvBuffer is hardware DMA buffer which can be accessed by hardware engines(GPU, NVENC, NVDEC, VIC, …) directly.

Is NVJPEGEncoder CUDA accelerated or hardware accelerated or does it just use CPU? Also for color conversions that are supported by both NvBuffer transform and NPP, which library will produce the fastest results on TX2?

NvJPEGEncoder uses an independent JPEG codec engine. It does not use GPU and CPU.

For color conversion, if you call NvBufferTransform(), it uses hardware converter VIC. If you call NPP function, it uses GPU.
If in your use-case, you need GPU for high loading task, we suggest call NvBufferTransform().

Thank you, I am using this information to write a TX2 image conversion library that will do compile-time shortest path analysis for all possible image conversions using a combination of Npp, NvBuffer, and NvJPEGEncoder/NvJPEGDecoder. Is it safe to treat NvBuffer planes as device memory and use them for input and output of Npp functions. I mainly need that for doing conversions to/from packed color spaces.