Using VPI in GStreamer

Hi,

Thanks for your testing.

Our experiment is running with IMX274.
Not sure if this makes a difference.

We are checking this internally.
Will share more information with you later.

Hi,

Were you able to check internally if the

  • code above uses NVMM in the background?
  • latency coincides with your own 1080p test?
  • latency for 2160p is close to expected, VPI - Vision Programming Interface: Remap doesn’t list 3840x2160 but if Streams: 4 is selected (not sure what Interval means), it is 15.17ms
  • does this mean that if there were two steams (two cameras), it is sharing the same 0.35-0.46Gpix/s (NV12 linear undistortion)
  • can it be used from two different processes?

Thanks.

Hi,

Thanks for your patience.
We are still checking this issue internally.

One more question, do you have another camera like IMX274?
If yes, could you share the result testing with IMX274?
This will align the environment between us.

Thanks.

Hi,

Sorry, don’t have access to IMX274, which adapter/carrier board is it connected to and is the driver native JetPack or publicly available? IMX477 and IMX274 are close in resolution and after ISP resizes both to same resolution, would there be any residual difference (to VIC) between the two?

Thanks.

Thanks for your feedback.

We are still checking this issue internally.
Hope to share more information with you soon.

Hi,

Sorry for the late update. Here is the answer:

>code above uses NVMM in the background?
It wraps images from EGL.

>latency coincides with your own 1080p test?
No, it looks like the latency of lines 329 and 361 is much larger in your use case.
But this might be related to the image size.

>latency for 2160p is close to expected
yes

>does this mean that if there were two steams (two cameras), it is sharing the same 0.35-0.46Gpix/s (NV12 linear undistortion)
If the resources are not fully occupied, the tasks from two streams can run concurrently.

>can it be used from two different processes?
No, it’s required to use single process multi-thread to run GPU jobs concurrently.
GPU sharing between processes is time-slicing.

Thanks.

Hi,

If VPI wraps image that is already in NVMM without copying, still doesn’t explain why wrapping 1080p takes similar amount of time as the actual calculations. Based on my results wrapping (and copying) is ~2ms while calculation is nominally 3.087ms.

The difference in latency at 329 and 361 is odd as filter is downstream from ISP resizing that is already happened beforehand. Your results are from Oct 13, mine are based on code from Nov 11, could you rerun that code. Another difference, IMX274 image is still resized albeit from slightly lower resolution, would you describe which ingest card and driver is it connected to?

Is the estimate of cumulative 0.35-0.46Gpix/s still correct?

Little confused with the GPU comment, since we are using VIC backed. In other words if there are 2 cameras, 2 gstreams and 2 undistort filters in the same process, since there are static members in the filter code (shared between 2 instances), don’t think that would work, would have to be separated into 2 processes. If in one process, since this is on gstreamer internal thread(s), how would it migrate on multiple user threads? If VIC can operate only on one image at a time, does VPI manage cross thread and process blocking?

Thanks.

Hi,

There is no memory copy in the vpiImageSetWrapper().

Under the max clock rate, your result in Nov12 takes around 0.77ms in wrapping for both resolutions.
This matches our result and it should be under the expectation.

There is another cuCtxSynchronize before #361.
There might be some jobs submitted to the context which cause the difference.

Is the “0.35-0.46Gpix/s” comes from the Orin camera spec or the IMX camera spec?

For the concurrent comment, sorry that it is for the CUDA backend.
For VIC, yes, you will need to create the separated nvbuffersession to let VIC run concurrently.

To figure out the latency comes from, would you mind trying the following experiment?

    1. Profile the pipeline with Nsight System to get more details about the latency.
    1. Could you test it with a video input to see if the #361 latency is reduced?

Thanks.

Hi,

The 0.35-0.46Gpix/s comes from measuring image resolution vs. latency or images/s, is this expected max whether it’s used from one or shared across multiple streams? There is Gpix/s spec for ISP but don’t see one for VIC.

Is there a coding example for nvbuffersession and gstreamer filter? Still don’t understand what would happen if each gstream (with separate calls to VPI/VIC) would be its own separate process, this is legal for GPU (outside VPI), what happens with VIC?

  1. Did the profiling with nsys and it doesn’t report any accelerator activity (other than GPU/DLA) even with flag turned on.
  2. Asked this before to eliminate the ingest differences, what is your CSI card and driver setup for IMX274?

Thanks.