- In the cfg of nvinfersever 1, “output_tensor_meta: true” needs to be set for saving inference results to user meta. please refer to opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-3d-action-recognition/config_triton_infer_primary_2d_action.txt.
OK, I have done it and it works because I can see the tensor with a probe using user_meta.base_meta.meta_type == pyds.NvDsMetaType.NVDSINFER_TENSOR_OUTPUT_META
- nvdspreprocess plugin provides a custom library interface for preprocessing on input streams. In the custom lib, you can get the inference results from user meta, then copy them as new tensors. pleas e refer to this sample for how to prepare tensors.
I am blocked here I think. I modified the CustomTensorPreparation method like this :
NvDsPreProcessStatus
CustomTensorPreparation(CustomCtx *ctx, NvDsPreProcessBatch *batch, NvDsPreProcessCustomBuf *&buf,
CustomTensorParams &tensorParam, NvDsPreProcessAcquirer *acquirer)
{
printf("CustomTensorPreparation called\n");
NvDsPreProcessStatus status = NVDSPREPROCESS_TENSOR_NOT_READY;
// Acquire a buffer from tensor pool
buf = acquirer->acquire();
void *pDst = buf->memory_ptr; // Destination GPU pointer
GstBuffer *inbuf = (GstBuffer *)batch->inbuf;
NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta(inbuf);
if (!batch_meta) {
g_printerr("Failed to get batch_meta from GstBuffer\n");
return NVDSPREPROCESS_TENSOR_NOT_READY;
}
bool tensor_found = false;
// Iterate over frames in batch
for (NvDsMetaList *l_frame = batch_meta->frame_meta_list; l_frame != nullptr; l_frame = l_frame->next) {
NvDsFrameMeta *frame_meta = (NvDsFrameMeta *)l_frame->data;
// Iterate through frame user metadata to find tensor output
for (NvDsMetaList *l_user = frame_meta->frame_user_meta_list; l_user != nullptr; l_user = l_user->next) {
NvDsUserMeta *user_meta = (NvDsUserMeta *)l_user->data;
if (user_meta->base_meta.meta_type == NVDSINFER_TENSOR_OUTPUT_META) {
// Found tensor output from previous nvinferserver
NvDsInferTensorMeta *tensor_meta = (NvDsInferTensorMeta *)user_meta->user_meta_data;
g_print("Found tensor meta: %u output layers\n", tensor_meta->num_output_layers);
for (uint i = 0; i < tensor_meta->num_output_layers; ++i) {
void *src_gpu_ptr = tensor_meta->out_buf_ptrs_dev[i];
NvDsInferDims dims = tensor_meta->output_layers_info[i].inferDims;
size_t num_elements = 1;
for (uint d = 0; d < dims.numDims; ++d) {
num_elements *= dims.d[d];
}
size_t layer_size_bytes = num_elements * sizeof(float); // assuming float32
g_print("Copying layer %u of size %zu bytes\n", i, layer_size_bytes);
// Copy data from previous model output (GPU) to current buffer (GPU)
cudaMemcpy(pDst,
src_gpu_ptr, // Source: GPU pointer
layer_size_bytes, // Size
cudaMemcpyDeviceToDevice);
// Advance destination pointer for next layer (if needed)
pDst = (char *)pDst + layer_size_bytes;
}
tensor_found = true;
status = NVDSPREPROCESS_SUCCESS;
break;
}
}
if (tensor_found)
break;
}
if (!tensor_found) {
g_printerr("No NvDsInferTensorMeta found in frame metadata!\n");
}
return status;
}
And I can see in the logs :
CustomTensorPreparation called
Found tensor meta: 1 output layers
Copying layer 0 of size 3686400 bytes
Which seems to be correct because the output tensor of model_part1 is 64x90x160 (x4 for FP32) = 3686400 bytes
Here is the config file of the nvdspreprocess plugin :
[property]
enable=1
unique-id=5
process-on-frame=1
target-unique-ids=1
network-input-order=0
#uniquely identify the metadata generated by this element
# gpu-id to be used
gpu-id=0
# if enabled maintain the aspect ratio while scaling
#maintain-aspect-ratio=1
# if enabled pad symmetrically with maintain-aspect-ratio enabled
#symmetric-padding=1
# processig width/height at which image scaled
processing-width=160
processing-height=90
# max buffer in scaling buffer pool
scaling-buf-pool-size=1
# max buffer in tensor buffer pool
tensor-buf-pool-size=1
# tensor shape based on network-input-order
network-input-shape= 1;64;90;160
# 0=RGB, 1=BGR, 2=GRAY
network-color-format=0
# 0=FP32, 1=UINT8, 2=INT8, 3=UINT32, 4=INT32, 5=FP16
tensor-data-type=0
# tensor name same as input layer name
tensor-name=/conv3_1/Conv_output_0
# 0=NVBUF_MEM_DEFAULT 1=NVBUF_MEM_CUDA_PINNED 2=NVBUF_MEM_CUDA_DEVICE 3=NVBUF_MEM_CUDA_UNIFIED
scaling-pool-memory-type=0
# 0=NvBufSurfTransformCompute_Default 1=NvBufSurfTransformCompute_GPU 2=NvBufSurfTransformCompute_VIC
scaling-pool-compute-hw=0
# Scaling Interpolation method
# 0=NvBufSurfTransformInter_Nearest 1=NvBufSurfTransformInter_Bilinear 2=NvBufSurfTransformInter_Algo1
# 3=NvBufSurfTransformInter_Algo2 4=NvBufSurfTransformInter_Algo3 5=NvBufSurfTransformInter_Algo4
# 6=NvBufSurfTransformInter_Default
scaling-filter=0
# custom library .so path having custom functionality
custom-lib-path=/home/orkais/orkais/examples/orkais_ulg_split/nvdspreprocess_lib/libcustom2d_preprocess.so
# custom tensor preparation function name having predefined input/outputs
# check the default custom library nvdspreprocess_lib for more info
custom-tensor-preparation-function=CustomTensorPreparation
[user-configs]
# Below parameters get used when using default custom library nvdspreprocess_lib
# network scaling factor
pixel-normalization-factor=1
# mean file path in ppm format
#mean-file=
# array of offsets for each channel
#offsets=
[group-0]
src-ids=0
custom-input-transformation-function=CustomTransformation
process-on-roi=0
#process-on-all-objects=0
#roi-params-src-0=0;0;100;100
#draw-roi=0
#input-object-min-width=100
#input-object-min-height=100
- In the cfg of nvinfersever 2, input_tensor_from_meta needs to be set. with this configuration, nvinferserver will use the tensors directly instead of doing preprocessing. please refer to opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-3d-action-recognition/config_triton_infer_primary_2d_action.txt.
Yes, I added it :
input_tensor_from_meta {
is_first_dim_batch: true
}
CONCLUSION
Unfortunately, if I probe the nvinferserver 2, I get the output tensor of the first nvinferserver (even if output_tensor_meta : true is set in the config of the nvinferserver 2.
Also, on my display, I can’t see the segmentation mask that should be the output of the nvinferserver2 (If I use the model, non split, I can see the segmentation mask ; the pipeline is working fine).
Could you help me further diagnose the process ?
Thank you