Raw output tensor for nvinfer sgie cannot be acessed

Please provide complete information as applicable to your setup.

• Geforce RTX 4070 Ti Laptop Gpu
• DeepStream Version 8.0
• Docker Container nvcr.io/nvidia/deepstream:8.0-triton-multiarch
• NVIDIA GPU Driver Version 575.64.03
• Issue Type BUG
• How to reproduce the issue ?
Pipeline structure: source → pgie → postprocessor → preprocessor → sgie → postprocessor
With this structure the last postprocessor has no way to access the raw output tensor. Reason is, that when sgie is configured with:

output-tensor-meta=1
input-tensor-from-meta=1
process-mode=2
network-type=100

the function attach_tensor_output_meta of gstnvinfer_meta_utils.cpp is NOT attaching the metadata to either object or frame meta, only to roi meta. The output can only be accessed through the user_meta_pool of the batch_meta object, but using this workaround there is no way to get the actual object the NvDsInferTensorMeta object belongs to, only the correct gie uid can be matched, but not the actual input object/roi for this output.

Following is my output_thread of postprocessor, which works completeley fine for pgie but not for sgie. For activated preprocessor, i added only prints to see if any data is available as stated by https://docs.nvidia.com/metropolis/deepstream/8.0/text/DS_plugin_gst-nvinfer.html#tensor-metadata docs, but there is no data attached either to NVDS_ROI_METAs roi_userdata_list (Which is ACTIVELY set to null in the meta utils function i mentioned) userdata nor directly to obj_user_meta_list and frame_user_meta_list:
Without the preprocessor in the pipeline, my pipeline works as expected, exept for bad detections caused by missing preprocessing

void InstanceSegImpl::output_thread(void) {
    GstFlowReturn flow_ret;
    GstBuffer *outBuffer = NULL;
    std::unique_lock<std::mutex> lk(m_process_lock);
    NvDsBatchMeta *batch_meta = NULL;
    int32_t frame_cnt = 0;
    while (1) {
        if (m_process_q.empty()) {
            if (m_stop) {
                break;
            }
            m_process_cv.wait(lk);
            continue;
        }

        PacketInfo packetInfo = m_process_q.front();
        m_process_q.pop();

        m_process_cv.notify_all();
        lk.unlock();

        NvBufSurface *in_surf = get_nvbuf_surface(packetInfo.inbuf);
        batch_meta = get_nvds_batch_meta(packetInfo.inbuf);
        outBuffer = packetInfo.inbuf;
        nvds_set_input_system_timestamp(outBuffer, GST_ELEMENT_NAME(m_element));
        if (m_preprocessor_support) {
            for (NvDsMetaList *l_frame = batch_meta->frame_meta_list; l_frame != NULL; l_frame = l_frame->next) {
                NvDsFrameMeta *frame_meta = (NvDsFrameMeta *)l_frame->data;
                /* First try: Search in ROI metadata (original approach) */
                for (NvDsMetaList *l_user = frame_meta->frame_user_meta_list; l_user != NULL; l_user = l_user->next) {
                    NvDsUserMeta *roi_user_meta = (NvDsUserMeta *)l_user->data;
                    if (roi_user_meta->base_meta.meta_type == NVDS_ROI_META) {
                        /* convert to roi metadata */
                        NvDsRoiMeta *roi_meta = (NvDsRoiMeta *)roi_user_meta->user_meta_data;
                        printf("frame_usermeta roi list: %p\n", roi_meta->roi_user_meta_list);
                    } else if (roi_user_meta->base_meta.meta_type == NVDSINFER_TENSOR_OUTPUT_META) {
                        /* convert to roi metadata */
                        NvDsInferTensorMeta *tensor_meta = (NvDsInferTensorMeta *)roi_user_meta->user_meta_data;
                        printf("frame_usermeta tensor gie uid: %d\n", tensor_meta->unique_id);
                    } else {
                        printf("roi_user_meta hat typ: %d\n", roi_user_meta->base_meta.meta_type);
                    }
                }
                for (NvDsMetaList *l_obj = frame_meta->obj_meta_list; l_obj != NULL; l_obj = l_obj->next) {
                    NvDsObjectMeta *obj_meta = (NvDsObjectMeta *)l_obj->data;
                    for (NvDsMetaList *l_user = obj_meta->obj_user_meta_list; l_user != NULL; l_user = l_user->next) {
                        NvDsUserMeta *user_meta = (NvDsUserMeta *)l_user->data;
                        if (user_meta->base_meta.meta_type == NVDS_ROI_META) {
                            NvDsRoiMeta *roi_meta = (NvDsRoiMeta *)user_meta->user_meta_data;
                            printf("object_usermeta roi list: %p\n", roi_meta->roi_user_meta_list);
                        } else if (user_meta->base_meta.meta_type == NVDSINFER_TENSOR_OUTPUT_META) {
                            /* convert to roi metadata */
                            NvDsInferTensorMeta *tensor_meta = (NvDsInferTensorMeta *)user_meta->user_meta_data;
                            printf("frame_usermeta tensor gie uid: %d\n", tensor_meta->unique_id);
                        } else {
                            printf("roi_user_meta hat typ: %d\n", user_meta->base_meta.meta_type);
                        }
                    }
                }
            }
        } else {
            /* Iterate each frame metadata in batch */
            for (NvDsMetaList *l_frame = batch_meta->frame_meta_list; l_frame != NULL; l_frame = l_frame->next) {
                NvDsFrameMeta *frame_meta = (NvDsFrameMeta *)l_frame->data;

                if (m_post_processor_params.process_mode == PROCESS_MODEL_FULL_FRAME) {
                    /* Iterate user metadata in frames to search PGIE's tensor metadata */
                    for (NvDsMetaList *l_user = frame_meta->frame_user_meta_list; l_user != NULL;
                         l_user = l_user->next) {
                        NvDsUserMeta *user_meta = (NvDsUserMeta *)l_user->data;
                        if (user_meta->base_meta.meta_type != NVDSINFER_TENSOR_OUTPUT_META) continue;

                        /* convert to tensor metadata */
                        NvDsInferTensorMeta *meta = (NvDsInferTensorMeta *)user_meta->user_meta_data;
                        // PGIE and operate on meta->unique_id data only
                        if (meta->unique_id == static_cast<guint>(m_gie_unique_id)) {
                            for (guint i = 0; i < meta->num_output_layers; i++) {
                                NvDsInferLayerInfo *info = &meta->output_layers_info[i];
                                info->buffer = meta->out_buf_ptrs_host[i];
                            }
                            /* Parse output tensor and fill detection results into objectList. */
                            std::vector<NvDsInferLayerInfo> outputLayersInfo(
                                meta->output_layers_info, meta->output_layers_info + meta->num_output_layers);
                            std::vector<ObjectData> output;
                            if (m_post_processor) {
                                m_post_processor->set_network_info(meta->network_info);
                                m_post_processor->parse_each_frame(meta, output);
                                m_post_processor->attach_object_meta(in_surf, frame_meta->batch_id, batch_meta,
                                                                     frame_meta, NULL, output);

                                m_post_processor->release_output(output);
                            } else {
                                GST_WARNING_OBJECT(m_element, "Post Processor not initialized for network");
                            }
                        }
                    }
                } else if (m_post_processor_params.process_mode == PROCESS_MODEL_OBJECTS) {
                    for (NvDsMetaList *l_obj = frame_meta->obj_meta_list; l_obj != NULL; l_obj = l_obj->next) {
                        NvDsObjectMeta *obj_meta = (NvDsObjectMeta *)l_obj->data;

                        /* Iterate user metadata in object to search SGIE's tensor data */
                        for (NvDsMetaList *l_user = obj_meta->obj_user_meta_list; l_user != NULL;
                             l_user = l_user->next) {
                            NvDsUserMeta *user_meta = (NvDsUserMeta *)l_user->data;
                            if (user_meta->base_meta.meta_type != NVDSINFER_TENSOR_OUTPUT_META) continue;

                            /* convert to tensor metadata */
                            NvDsInferTensorMeta *meta = (NvDsInferTensorMeta *)user_meta->user_meta_data;

                            if (meta->unique_id == static_cast<guint>(m_gie_unique_id)) {
                                for (unsigned int i = 0; i < meta->num_output_layers; i++) {
                                    NvDsInferLayerInfo *info = &meta->output_layers_info[i];
                                    info->buffer = meta->out_buf_ptrs_host[i];
                                }
                                std::vector<NvDsInferLayerInfo> outputLayersInfo(
                                    meta->output_layers_info, meta->output_layers_info + meta->num_output_layers);
                                std::vector<ObjectData> output;
                                if (m_post_processor) {
                                    m_post_processor->set_network_info(meta->network_info);
                                    m_post_processor->parse_each_frame(meta, output);
                                    /* Generate classifer metadata and attach to obj_meta */
                                    m_post_processor->attach_object_meta(in_surf, frame_meta->batch_id, batch_meta,
                                                                         frame_meta, obj_meta, output);
                                    m_post_processor->release_output(output);
                                } else {
                                    GST_WARNING_OBJECT(m_element, "Post Processor not initialized for network");
                                }
                            }
                        }
                    }
                }
            }
        }

        nvds_set_output_system_timestamp(outBuffer, GST_ELEMENT_NAME(m_element));
        flow_ret = gst_pad_push(GST_BASE_TRANSFORM_SRC_PAD(m_element), outBuffer);
        GST_DEBUG_OBJECT(m_element,
                         "CustomLib: %s in_surf=%p, Pushing Frame %d to downstream... Frame %d flow_ret = %d"
                         " TS=%" GST_TIME_FORMAT " \n",
                         __func__, in_surf, packetInfo.frame_num, frame_cnt++, flow_ret,
                         GST_TIME_ARGS(GST_BUFFER_PTS(outBuffer)));

        lk.lock();
        continue;
    }
    m_output_thread_stopped = true;
    lk.unlock();
    return;
}

After looking at Not able to access raw tensor output as metadata - #10 by theelitepro0224

I also looked at the source code for Deepstrem 8.0 and 7.1 in comparison the newest version is not copying the data correctly i think, the 7.1 version is correctly copying everything to the NvDsUserMetas data.

Deepstream 8.0:

/* Attaches the raw tensor output to the GstBuffer as metadata. */
void attach_tensor_output_meta(GstNvInfer *nvinfer,
                               GstMiniObject *tensor_out_object,
                               GstNvInferBatch *batch,
                               NvDsInferContextBatchOutput *batch_output) {
  NvDsBatchMeta *batch_meta =
      (nvinfer->process_full_frame || nvinfer->input_tensor_from_meta)
          ? batch->frames[0].frame_meta->base_meta.batch_meta
          : batch->frames[0].obj_meta->base_meta.batch_meta;

  /* Create and attach NvDsInferTensorMeta for each frame/object. Also
   * increment the refcount of GstNvInferTensorOutputObject. */
  for (size_t j = 0; j < batch->frames.size(); j++) {
    GstNvInferFrame &frame = batch->frames[j];

    /* Processing on ROIs (not frames or objects) skip attaching tensor output
     * to frames or objects. */
    NvDsInferTensorMeta *meta = new NvDsInferTensorMeta;
    meta->unique_id = nvinfer->unique_id;
    meta->num_output_layers = nvinfer->output_layers_info->size();
    meta->output_layers_info = (NvDsInferLayerInfo *)g_memdup2(
        nvinfer->output_layers_info->data(),
        meta->num_output_layers * sizeof(NvDsInferLayerInfo));
    meta->out_buf_ptrs_host = new void *[meta->num_output_layers];
    meta->out_buf_ptrs_dev = new void *[meta->num_output_layers];
    meta->gpu_id = nvinfer->gpu_id;
    meta->priv_data = gst_mini_object_ref(tensor_out_object);
    meta->network_info = nvinfer->network_info;
    meta->maintain_aspect_ratio = nvinfer->maintain_aspect_ratio;
    meta->symmetric_padding = nvinfer->symmetric_padding;

    for (unsigned int i = 0; i < meta->num_output_layers; i++) {
      NvDsInferLayerInfo &info = meta->output_layers_info[i];
      meta->out_buf_ptrs_dev[i] =
          (uint8_t *)batch_output->outputDeviceBuffers[i] +
          info.inferDims.numElements * get_element_size(info.dataType) * j;
      meta->out_buf_ptrs_host[i] =
          (uint8_t *)batch_output->hostBuffers[info.bindingIndex] +
          info.inferDims.numElements * get_element_size(info.dataType) * j;
    }

    NvDsUserMeta *user_meta = nvds_acquire_user_meta_from_pool(batch_meta);
    user_meta->user_meta_data = meta;
    user_meta->base_meta.meta_type = (NvDsMetaType)NVDSINFER_TENSOR_OUTPUT_META;
    user_meta->base_meta.release_func = release_tensor_output_meta;
    user_meta->base_meta.copy_func = copy_tensor_output_meta;
    user_meta->base_meta.batch_meta = batch_meta;

    if (nvinfer->input_tensor_from_meta) {
      /* Attach tensor meta as part of ROI Meta */
      nvds_add_user_meta_to_roi(frame.roi_meta, user_meta);

      /* Create a new user meta for ROI meta at frame level */
      NvDsUserMeta *roi_user_meta =
          nvds_acquire_user_meta_from_pool(batch_meta);
      NvDsRoiMeta *frame_roi_meta = new NvDsRoiMeta;
      // Copy ROI data 
	  // WHY DO WE NEED TO COPY THE ROI DATA LIKE THIS ???
      frame_roi_meta->roi = frame.roi_meta->roi;
      memcpy(frame_roi_meta->roi_polygon, frame.roi_meta->roi_polygon,
             sizeof(guint) * DS_MAX_POLYGON_POINTS * 2);
      frame_roi_meta->converted_buffer = frame.roi_meta->converted_buffer;
      frame_roi_meta->frame_meta = frame.roi_meta->frame_meta;
      frame_roi_meta->scale_ratio_x = frame.roi_meta->scale_ratio_x;
      frame_roi_meta->scale_ratio_y = frame.roi_meta->scale_ratio_y;
      frame_roi_meta->offset_left = frame.roi_meta->offset_left;
      frame_roi_meta->offset_top = frame.roi_meta->offset_top;
      frame_roi_meta->object_meta = nullptr;
      frame_roi_meta->classifier_meta_list = nullptr;
      frame_roi_meta->roi_user_meta_list = nullptr;
	  // This should be replaced as in Deepstrem 7.1
	  // roi_user_meta->user_meta_data = frame.roi_meta;
      roi_user_meta->user_meta_data = frame_roi_meta;
      roi_user_meta->base_meta.meta_type = (NvDsMetaType)NVDS_ROI_META;
      roi_user_meta->base_meta.release_func = release_user_meta_at_frame_level;
      roi_user_meta->base_meta.copy_func = copy_roi_meta;
      roi_user_meta->base_meta.batch_meta = batch_meta;

      /* Add ROI meta to frame meta */
      nvds_add_user_meta_to_frame(frame.frame_meta, roi_user_meta);

      /* If object is ROI itself, add ROI meta to object meta */
      if (frame.obj_meta) {
		// This should be replaced as in Deepstrem 7.1
		// obj_roi_user_meta->user_meta_data = frame.roi_meta;
        NvDsUserMeta *obj_roi_user_meta =
            nvds_acquire_user_meta_from_pool(batch_meta);
        NvDsRoiMeta *obj_roi_meta = new NvDsRoiMeta;
        // Copy ROI data
        obj_roi_meta->roi = frame.roi_meta->roi;
        memcpy(obj_roi_meta->roi_polygon, frame.roi_meta->roi_polygon,
               sizeof(guint) * DS_MAX_POLYGON_POINTS * 2);
        obj_roi_meta->converted_buffer = frame.roi_meta->converted_buffer;
        obj_roi_meta->frame_meta = frame.roi_meta->frame_meta;
        obj_roi_meta->scale_ratio_x = frame.roi_meta->scale_ratio_x;
        obj_roi_meta->scale_ratio_y = frame.roi_meta->scale_ratio_y;
        obj_roi_meta->offset_left = frame.roi_meta->offset_left;
        obj_roi_meta->offset_top = frame.roi_meta->offset_top;
        obj_roi_meta->object_meta = nullptr;
        obj_roi_meta->classifier_meta_list = nullptr;
        obj_roi_meta->roi_user_meta_list = nullptr;
        obj_roi_user_meta->user_meta_data = obj_roi_meta;
        obj_roi_user_meta->base_meta.meta_type = (NvDsMetaType)NVDS_ROI_META;
        obj_roi_user_meta->base_meta.release_func =
            release_user_meta_at_frame_level;
        obj_roi_user_meta->base_meta.copy_func = nullptr;
        obj_roi_user_meta->base_meta.batch_meta = batch_meta;
        nvds_add_user_meta_to_obj(frame.obj_meta, obj_roi_user_meta);
      }
    } else if (nvinfer->process_full_frame) {
      nvds_add_user_meta_to_frame(frame.frame_meta, user_meta);
    } else {
      nvds_add_user_meta_to_obj(frame.obj_meta, user_meta);
    }
  }
}

Deepstream 7.1:

/* Attaches the raw tensor output to the GstBuffer as metadata. */
void
attach_tensor_output_meta (GstNvInfer *nvinfer, GstMiniObject * tensor_out_object,
    GstNvInferBatch *batch, NvDsInferContextBatchOutput *batch_output)
{
  NvDsBatchMeta *batch_meta = (nvinfer->process_full_frame
      || nvinfer->input_tensor_from_meta) ? batch->frames[0].
      frame_meta->base_meta.batch_meta : batch->frames[0].obj_meta->base_meta.
      batch_meta;

  /* Create and attach NvDsInferTensorMeta for each frame/object. Also
   * increment the refcount of GstNvInferTensorOutputObject. */
  for (size_t j = 0; j < batch->frames.size (); j++) {
    GstNvInferFrame & frame = batch->frames[j];

    /* Processing on ROIs (not frames or objects) skip attaching tensor output
     * to frames or objects. */
    NvDsInferTensorMeta *meta = new NvDsInferTensorMeta;
    meta->unique_id = nvinfer->unique_id;
    meta->num_output_layers = nvinfer->output_layers_info->size ();
    meta->output_layers_info = (NvDsInferLayerInfo*)g_memdup2(nvinfer->output_layers_info->data (),
     meta->num_output_layers * sizeof (NvDsInferLayerInfo)) ;
    meta->out_buf_ptrs_host = new void *[meta->num_output_layers];
    meta->out_buf_ptrs_dev = new void *[meta->num_output_layers];
    meta->gpu_id = nvinfer->gpu_id;
    meta->priv_data = gst_mini_object_ref (tensor_out_object);
    meta->network_info = nvinfer->network_info;
    meta->maintain_aspect_ratio = nvinfer->maintain_aspect_ratio;
    meta->symmetric_padding = nvinfer->symmetric_padding;

    for (unsigned int i = 0; i < meta->num_output_layers; i++) {
      NvDsInferLayerInfo & info = meta->output_layers_info[i];
      meta->out_buf_ptrs_dev[i] =
          (uint8_t *) batch_output->outputDeviceBuffers[i] +
          info.inferDims.numElements * get_element_size (info.dataType) * j;
      meta->out_buf_ptrs_host[i] =
          (uint8_t *) batch_output->hostBuffers[info.bindingIndex] +
          info.inferDims.numElements * get_element_size (info.dataType) * j;
    }

    NvDsUserMeta *user_meta = nvds_acquire_user_meta_from_pool (batch_meta);
    user_meta->user_meta_data = meta;
    user_meta->base_meta.meta_type =
        (NvDsMetaType) NVDSINFER_TENSOR_OUTPUT_META;
    user_meta->base_meta.release_func = release_tensor_output_meta;
    user_meta->base_meta.copy_func = copy_tensor_output_meta;
    user_meta->base_meta.batch_meta = batch_meta;

    if (nvinfer->input_tensor_from_meta) {
      /* Attach tensor meta as part of ROI Meta */
      nvds_add_user_meta_to_roi (frame.roi_meta, user_meta);
      /* Attach ROI Meta to Frame Meta */
      NvDsUserMeta *user_meta = nvds_acquire_user_meta_from_pool (batch_meta);
      user_meta->user_meta_data = frame.roi_meta;
      user_meta->base_meta.meta_type =
        (NvDsMetaType) NVDS_ROI_META;
      user_meta->base_meta.release_func = release_user_meta_at_frame_level;
      user_meta->base_meta.copy_func = NULL;
      user_meta->base_meta.batch_meta = batch_meta;
      nvds_add_user_meta_to_frame (frame.frame_meta, user_meta);
      /* if object is roi itself */
      if (frame.obj_meta) {
        nvds_add_user_meta_to_obj (frame.obj_meta, user_meta);
      }
    } else if (nvinfer->process_full_frame) {
      nvds_add_user_meta_to_frame (frame.frame_meta, user_meta);
    } else {
      nvds_add_user_meta_to_obj (frame.obj_meta, user_meta);
    }
  }
}

After adapting this function, the release_tensor_output_meta function is called too often and results in a segfault.

After many hours of testing and annoyance i got to this solution, luckily roi_meta of NvDsPreProcessorBatchMeta contains frame_meta and object_meta:

for(NvDsMetaList *l_batch_user_meta = batch_meta->batch_user_meta_list; l_batch_user_meta != NULL; l_batch_user_meta = l_batch_user_meta->next) {
                NvDsUserMeta *batch_user_meta = (NvDsUserMeta *)l_batch_user_meta->data;
                if(batch_user_meta->base_meta.meta_type != NVDS_PREPROCESS_BATCH_META) continue;
                GstNvDsPreProcessBatchMeta *preprocess_batch_meta = (GstNvDsPreProcessBatchMeta *)batch_user_meta->user_meta_data;
                for(auto roi_meta : preprocess_batch_meta->roi_vector) {
                    for(NvDsMetaList *l_roi_user = roi_meta.roi_user_meta_list; l_roi_user != NULL; l_roi_user = l_roi_user->next) {
                        NvDsUserMeta *roi_user_meta = (NvDsUserMeta *)l_roi_user->data;
                        if(roi_user_meta->base_meta.meta_type != NVDSINFER_TENSOR_OUTPUT_META) continue;
                        NvDsInferTensorMeta *meta = (NvDsInferTensorMeta *)roi_user_meta->user_meta_data;
                        NvDsObjectMeta *obj_meta = (NvDsObjectMeta *)roi_meta.object_meta;
                        NvDsFrameMeta *frame_meta = (NvDsFrameMeta *)roi_meta.frame_meta;
                        if (meta->unique_id == static_cast<guint>(m_gie_unique_id)) {
                                for (guint i = 0; i < meta->num_output_layers; i++) {
                                    NvDsInferLayerInfo *info = &meta->output_layers_info[i];
                                    info->buffer = meta->out_buf_ptrs_host[i];
                                }
                                /* Parse output tensor and fill detection results into objectList. */
                                std::vector<NvDsInferLayerInfo> outputLayersInfo(
                                    meta->output_layers_info, meta->output_layers_info + meta->num_output_layers);
                                std::vector<ObjectData> output;
                                if (m_post_processor) {
                                    m_post_processor->set_network_info(meta->network_info);
                                    m_post_processor->parse_each_frame(meta, output);
                                    m_post_processor->attach_object_meta(in_surf, frame_meta->batch_id, batch_meta,
                                                                         frame_meta, obj_meta, output);

                                    m_post_processor->release_output(output);
                                } else {
                                    GST_WARNING_OBJECT(m_element, "Post Processor not initialized for network");
                                }
                            }
                    }
                }
            }

could you elaborate on the question? If there is a bug, Please share the detailed reproducing steps. Thanks!

Steps to reproduce this were the following pipeline:

primary-inference → postprocessor → preprocessor → nvinfer → postprocessor

in the second postprocessor library in the output thread the InvInferTensorMeta data is not available where it is by the documentation. There is nor ROI_Meta attached to any of the objects or to any frame.

I solved this issue by moving back to Deepstream 7.1

I dont understand what information you are missing, i gave every information i have, can you specify?

right. you can get output tensor meta in roimeta of NvDsPreProcessorBatchMeta. roimeta contains framemeta and objectmeta. hence, you can get the actual object the NvDsInferTensorMeta object belongs to.

Thanks for the sharing! the doc needs to be updated.

I found that for sgie with a large batch size my postprocessing löibrary calls parse_each_frame per unit in the batch, causing huge latency, this is because somehow the NvInfer output Metadata has no batch_dimension according to Output LayerInfo of meta.
Is there any way to access the fully batched out-tensor of the model in postprocessor?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.