AGX Orin VPI with TRT inference

Hi,

I couldn’t find anything in the VPI docs or from searching the forums.

VPI claims that other backends like PVA or VIC can be utilized for some image processing and CV algorithms which leaves the GPU free to do something else like neural net inference.

I’d like to be able to run TRT inference with an engine file on a VPI image that’s been transferred to the GPU. I want to keep it as a VPI image so that I can use the PVA backend in conjunction with inference on the GPU. Is this possible? Are there any examples? Since the VPI image is in CUDA memory, can I make an inference call like it’s stated here: Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

Hi,

Please find VPI online document below:
https://docs.nvidia.com/vpi/

There is an example of using VPI with PyTorch.
The workflow for VPI+TensorRT should be similar.

/opt/nvidia/vpi2/samples/16-vpi_pytorch

Thanks.

Thanks for sharing. From my understanding, using VPI would only eliminate the copying of images from CPU to GPU and then copying results back to CPU. So I wouldn’t have to move data using cudaMemcpyAsync()?

I’d be able to directly run inference using context->enqueueV3(stream); I’d still also need to setup the execution context and deserialize the TRT engine file since that’s separate from VPI. Is that correct?

Hi,

VPI supports CUDA buffer data mapping.
You can feed the GPU buffer to TensorRT without CPU ↔ GPU memcpy.

https://docs.nvidia.com/vpi/group__VPI__Image.html#ga6024548d5ee11f88fab7341830262e2d

The TensorRT context and deserialization are needed.
Thanks.

Got it. I’ll try getting TensorRT to work with VPI and post again if I run into issues. Thanks!

Hi,

One question that might be related. I looked at this post Using VPI with VIC backend in Deepstream pipeline on AGX Xavier - #7 by blubthefish

My question is when might I need to use the vpiImageCreateCUDAMemWrapper() ? Would it function the same as vpiImageCreateWrapperOpenCVMat() after I move the image to the GPU? Would it be better to wrap it into CUDAMem to make it better for TRT inference?

Also, I need to normalize my input image before passing it into TRT inference. I’m guessing it would be better to do this in OpenCV natively before wrapping it into a VPI image?

Hi,

vpiImageCreateCUDAMemWrapper(.) is integrated into vpiImageCreateWrapper(.) in JetPack 5.

The wrapper does a similar thing but uses a different input buffer only.
The difference should be the VPI part since TensorRT always takes GPU buffer as input.

The first one is to feed VPI a CPU buffer and let VPI do the memcpy/synchronize for you.
Or you want to create the CUDA buffer on your own to get more access to the buffer.

Normalization can be done with OpenCV (CPU operator) or you can add it as a TensorRT layer (GPU operator).

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.