Please provide complete information as applicable to your setup.
**• dGPU
• Deepstream 6.4
**• Is it possible to use the VPI 3.0 vpiImageCreateWrapper
on x86 dGPU devices to keep the image data from NvBufSurface in the GPU memory? **
As an example, I am taking the gst-dsexample
gst-plugin and I am modifying it so that it creates a PIP(Picture-in-Picture) in the bottom right 20% of the screen with an enhanced close-up of the cropped bounding box region. I successfully did this with OpenCV but this is doing so with a hit because I have to copy the Image Data from GPU memory to the CPU.
As an example, In the gst_pip_transform_ip
function, I have modified it to the following so far:
/**
* Called when element recieves an input buffer from upstream element.
*/
static GstFlowReturn gst_pip_transform_ip(GstBaseTransform *btrans, GstBuffer *inbuf) {
GstPip *pip = GST_PIP(btrans);
VPIStatus status;
// Get batch metadata
NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta(inbuf);
if (!batch_meta) {
GST_ERROR_OBJECT(pip, "No batch metadata found");
return GST_FLOW_ERROR;
}
// Find best object
NvDsObjectMeta *obj_meta = find_central_object_with_highest_confidence(pip, batch_meta);
if (!obj_meta)
return GST_FLOW_OK;
// Map the input buffer
GstMapInfo in_map_info;
if (!gst_buffer_map(inbuf, &in_map_info, GST_MAP_READ)) {
GST_ERROR_OBJECT(pip, "Failed to map input buffer");
return GST_FLOW_ERROR;
}
NvBufSurface *surface = (NvBufSurface *)in_map_info.data;
// Calculate PIP dimensions
int32_t pipWidth = surface->surfaceList[0].width * 0.2;
int32_t pipHeight = surface->surfaceList[0].height * 0.2;
int32_t pipPosX = surface->surfaceList[0].width - pipWidth;
int32_t pipPosY = surface->surfaceList[0].height - pipHeight;
// Create input image from CUDA memory
// TODO Can only Wrap the image into an NVBuffer on Tegra platforms ONLY!!
VPIImageData image_data;
image_data.bufferType = VPI_IMAGE_BUFFER_NVBUFFER;
NvBufSurfaceMapParams nvbuf_surface_map_params;
int result = NvBufSurfaceGetMapParams(surface, 0, &nvbuf_surface_map_params);
assert(result == 0);
assert(nvbuf_surface_map_params.fd != 0);
image_data.bufferType = VPI_IMAGE_BUFFER_NVBUFFER;
image_data.buffer.fd = nvbuf_surface_map_params.fd;
status = vpiImageCreateWrapper(&image_data, nullptr, pip->backend, &pip->input_image);
if (status != VPI_SUCCESS)
goto error;
// Synchronize the stream
status = vpiStreamSync(pip->vpi_stream);
if (status != VPI_SUCCESS)
goto error;
gst_buffer_unmap(inbuf, &in_map_info);
return GST_FLOW_OK;
error:
gst_buffer_unmap(inbuf, &in_map_info);
return GST_FLOW_ERROR;
}
Stepping through with a debugger, the above works on the Jetson Device but it doesn’t work on my dGPU x86 system. Looking at the documentation, it appears that I am not able to use this for dGPU per this:
The following buffer types are supported:
VPI_IMAGE_BUFFER_HOST_PITCH_LINEAR
VPI_IMAGE_BUFFER_CUDA_PITCH_LINEAR
VPI_IMAGE_BUFFER_EGLIMAGE (on Tegra platforms only)
VPI_IMAGE_BUFFER_NVBUFFER (on Tegra platforms only)
Looks like I will need to determine if a system is using a dGPU and either use VPI_IMAGE_BUFFER_HOST_PITCH_LINEAR
or VPI_IMAGE_BUFFER_CUDA_PITCH_LINEAR
. Does anyone have examples of how to properly do this?
Thanks