Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) Jetson • DeepStream Version Deepstream 6.2 • JetPack Version (valid for Jetson only) 5.1.2 • TensorRT Version 8.5.1.7
Hello,
In the nvinfer plugin, I want to avoid copying the inference output to the host buffer within the NvDsInferContextImpl::queueInputBatchPreprocessed function.
The reason is that it is an unnecessary waste of resources, as I can simply pass the device buffer to nvdspostprocess and handle it on the GPU.
NvDsInferStatus
NvDsInferContextImpl::queueInputBatchPreprocessed(NvDsInferContextBatchPreprocessedInput &batchInput)
{
...
RETURN_NVINFER_ERROR(m_Postprocessor->copyBuffersToHostMemory(
*safeRecyleBatch, *m_PostprocessStream),
"post cuda process failed.");
m_ProcessBatchQueue.push(safeRecyleBatch.release());
return NVDSINFER_SUCCESS;
}
I knew there is a variable called m_disableOutputHostCopy. Is there a way to disable this copy operation by setting a plugin property, without modifying the source code?
Unfortunately, the disable-output-host-copy=1 setting does not work.
In practice, it makes an error like Unknown or legacy key specified 'disable-output-host-copy' for group [property].
Additionally, there is no function of gstnvinfer_property_parser.cpp that sets this value.
NvDsInferStatus
InferPostprocessor::copyBuffersToHostMemory(NvDsInferBatch& batch, CudaStream& mainStream)
{
assert(m_AllLayerInfo.size());
/* Queue the copy of output contents from device to host memory after the
* infer completion event. */
for (size_t i = 0; i < m_AllLayerInfo.size(); i++)
{
NvDsInferLayerInfo& info = m_AllLayerInfo[i];
assert(info.inferDims.numElements > 0);
if (!info.isInput && needOutputCopyB4Processing())
{
RETURN_CUDA_ERR(
cudaMemcpyAsync(batch.m_HostBuffers[info.bindingIndex]->ptr(),
batch.m_DeviceBuffers[info.bindingIndex],
getElementSize(info.dataType) * info.inferDims.numElements *
batch.m_BatchSize,
cudaMemcpyDeviceToHost, mainStream),
"postprocessing cudaMemcpyAsync for output buffers failed");
}
else if (needInputCopy()&&info.isInput)
{
RETURN_CUDA_ERR(
cudaMemcpyAsync(batch.m_HostBuffers[info.bindingIndex]->ptr(),
batch.m_DeviceBuffers[info.bindingIndex],
getElementSize(info.dataType) * info.inferDims.numElements *
batch.m_BatchSize,
cudaMemcpyDeviceToHost, mainStream),
"postprocessing cudaMemcpyAsync for input buffers failed");
}
}
/* Record CUDA event to later synchronize for the copy to actually
* complete. */
if (batch.m_OutputCopyDoneEvent)
{
RETURN_CUDA_ERR(cudaEventRecord(*batch.m_OutputCopyDoneEvent, mainStream),
"Failed to record batch cuda copy-complete-event");
}
return NVDSINFER_SUCCESS;
}
I specifically want to know how to make the needOutputCopyB4Processing() function return false without modifying the nvinfer code.
How can I port just the parsing functionality? Essentially, you’re suggesting to switch from DS6.2 to DS7.0, but that’s not an easy task on Jetpack 5.1.2. It’s unfortunate.
you don’t need to upgrade to DS7.0. you can download DS7.0 code, then port the disable-output-host-copy yaml parsing code in gst_nvinfer_parse_props_yaml to DS6.2. if you are using txt configuration file, please port the code to gstnvinfer_property_parser.cpp.