NvInferServer implementation of LSTM model

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) RTX 3060
• DeepStream Version 7.1
• JetPack Version (valid for Jetson only)
• TensorRT Version 10.5
• NVIDIA GPU Driver Version (valid for GPU only) 560.70
• Issue Type( questions, new requirements, bugs) Question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Is it possible to implement an LSTM model as an SGIE in the nvinferserver binary? I mean, to implement a secondary-gie after its corresponding secondary-pre-process.

I’ve seen on the forum that some users have managed to implement SGIEs with tensor inputs instead of images. However, all the implementations I’ve come across use nvinfer rather than nvinferserver.

For example, I tried adapt the sample deepstream_tao_apps/apps/tao_others/deepstream-pose-classification at master · NVIDIA-AI-IOT/deepstream_tao_apps but I am not able to adapt it to my case with nvinferserver.

Thanks in advance

About LSTM mode issue, is this topic helpful?
About “nvdspreprocess+sgie” usage, please refer to DeepStream SDK native sample deepstream-preprocess-test, config_preprocess_sgie.txt is nvdspreprocess cfg before sgie. you need to add sgie to pipeline, and set input-tensor-meta to true for sgie.

I have been checking that case but I am still not able to find a solution. I need the nvdspreprocess to output a tensor containing the center coordinates of the bounding boxes for detected objects after tracking them in previous frames.

My simplified pipeline is as follows:

src -> pgie -> nvtracker -> nvdspreprocess -> sgie (LSTM) -> sink

I need the sgie to receive an input tensor of size [1, 30, 2] for each detected object, representing the (x, y) coordinates of the object over the last 30 frames. The LSTM will then predict the (x, y) coordinates for the subsequent frames, resulting in an output tensor of size [1, 3, 2] for each detected object if 3 frames are predicted.

Additionally, I am unsure how to retrieve the output from the sgie, whether for display on the OSD or saving to a file.

Finally, in my current pipeline, I notice that the sink pads of nvdspreprocess and sgie are linked to a fakesink. Is this correct, considering that the data is in NvDsBatchMeta?

  1. About “representing the (x, y) coordinates of the object over the last 30 frames”, please refer to the sample deepstream-pose-classification mentioned above. In this sample, the first model detects person, the second model detects 34 keypoints (x,y,z) of body. the nvdspreprocess makes an tensor of [3 X 300 X 34] for each detected object, representing the 34 keypoints (x, y, z) coordinates of the object over the last 300 frames.
  2. About LSTM, please refer to the doc in nvinferserver Introduction. In short, you need to use IInferCustomProcessor interface. please refer to the sample opt\nvidia\deepstream\deepstream\sources\TritonOnnxYolo\nvdsinferserver_custom_impl_yolo\nvdsinferserver_custom_process_yolo.cpp, In inferenceDone, you can save the inferences to a variable A, In extraInputProcess, you can get the data from A, then add the data as extra inputs.
  3. what is the LSTM model used to do? could your share the scenario?

Thanks for the info. I have been working on that and I am facing some errors.

I added the following probe function to the sink of the sgie to check the type of the metadata received and the shape of the tensor input in the case of that metadata is NVDS_PREPROCESS_BATCH_META.

// Probe function for the sgie bin
static GstPadProbeReturn
sgie_bin_probe (GstPad *pad, GstPadProbeInfo *info, gpointer user_data)
{
    GstBuffer *buf = (GstBuffer *) info->data;
    NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta(buf);
    
    if (!batch_meta) {
        g_print("Batch meta not found for buffer %p\n", buf);
        return GST_PAD_PROBE_OK;
    }

    // Iterate through batch user metadata
    for (NvDsMetaList *l_user = batch_meta->batch_user_meta_list; l_user != NULL; l_user = l_user->next) {
        NvDsUserMeta *user_meta = (NvDsUserMeta *)l_user->data;

        // Check for preprocess tensor metadata
        if (user_meta->base_meta.meta_type == NVDS_PREPROCESS_BATCH_META) {
          g_print("User meta type is: preprocess batch meta\n");
            GstNvDsPreProcessBatchMeta *batch_meta = (GstNvDsPreProcessBatchMeta *)user_meta->user_meta_data;
            
            if (batch_meta) {
                g_print("Batch Meta Available\n");

                // Print tensor shape
                g_print("Tensor Shape: [%d, %d, %d, %d]\n",
                        batch_meta->tensor_meta->tensor_shape[0],  
                        batch_meta->tensor_meta->tensor_shape[1],  
                        batch_meta->tensor_meta->tensor_shape[2],  
                        batch_meta->tensor_meta->tensor_shape[3]);

            }
        }
        else if (user_meta->base_meta.meta_type == NVDS_TRACKER_PAST_FRAME_META) {
          g_print("User meta type is: tracker past frame meta\n");
        }

        else{
          g_print("User meta type is unknown: %d\n", user_meta->base_meta.meta_type);
        }
    }
    return GST_PAD_PROBE_OK;
}

The metadata received are NVDS_TRACKER_PAST_FRAME_META and NVDS_PREPROCESS_BATCH_META. Since input-tensor-meta is set to 1, it is not working with the metadata from the tracker, only with the metadata of the preprocess. The problem appears when the first batch metadata from the preprocess appears. The program breaks and when it prints that the tensor shape, it is [32, 3, 224, 224], the same as the image, it happens even when using the preprocess library as it is built in sample “deepstream-pose-classification”.

So, I guess the problem is that the output data from the preprocess is not being saved correctly in the GstNvDsPreProcessBatchMeta struct.

User meta type is: tracker past frame meta
User meta type is: tracker past frame meta
User meta type is: tracker past frame meta
User meta type is: preprocess batch meta
Batch Meta Available
Tensor Shape: [32, 3, 224, 224]

please refer to cfg. For your model, network-input-shape should be set 32;30;2;1, representing batch-size;30 frames; 2 dimenson coordinates.

Thanks for your help.

I am struggling to understand how the tensor is saved by the preprocessing bin and received by the SGIE. Right now, I am using a basic LSTM model to output the same as the input to verify whether the model is receiving the input tensor correctly and to ensure no additional errors are introduced.

I attach the file used for creating the dynamic library for preprocessing in case it helps.
nvds_preprocess_lib.cpp.txt (8.4 KB)

I am encountering the following error:

[docker-desktop:11467:0:11602] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
==== backtrace (tid:  11602) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x0000000000034c80 gstnvinferserver::InferFrame::InferFrame()  ???:0
 2 0x000000000002bf73 gstnvinferserver::GstNvInferServerImpl::processInputTensor()  ???:0
 3 0x000000000001af9d gst_nvinfer_server_get_type()  ???:0
 4 0x000000000003d124 gst_base_parse_merge_tags()  ???:0
 5 0x000000000008f7cd gst_pad_query()  ???:0
 6 0x0000000000092d69 gst_pad_get_allowed_caps()  ???:0
 7 0x000000000009318e gst_pad_push()  ???:0
 8 0x0000000000046875 gst_plugin_coreelements_register()  ???:0
 9 0x00000000000ba127 gst_tag_get_nick()  ???:0
10 0x000000000008a384 g_thread_pool_thread_proxy()  /opt/tritonserver/librdkafka/hiredis/mosquitto-2.0.15/glib/build/../glib/gthreadpool.c:350
11 0x0000000000089ac1 g_thread_proxy()  /opt/tritonserver/librdkafka/hiredis/mosquitto-2.0.15/glib/build/../glib/gthread.c:831
12 0x0000000000094ac3 pthread_condattr_setpshared()  ???:0
13 0x0000000000126850 __xmknodat()  ???:0
=================================

Thanks in advance for your time and help

nvinfersever plugin and low-level lib are opensource. could you add logs in GstNvInferServerImpl::processInputTensor to check if which code line causes the crash? the path is /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvinferserver/.

1 Like

Thank you, @fanzh. I did that and found the problem.

The issue was that the nvinferserver was expecting a tensor with the batch size specified in the preprocess config file. However, since the number of pedestrians in each frame doesn’t always match the specified batch size, it was causing an error.

Now, the inferserver checks the actual batch size of the tensor before performing inference.

Regards,
David