Trouble debugging custom tensor preprocessing for sgie step

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version: 7.1
• TensorRT Version: 10.13
• NVIDIA GPU Driver Version (valid for GPU only): Ada Lovelace
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

I am trying to build a deepstream app to process a specific spatio-temporal model we have built in our lab. The model takes in two input tensors:

  1. Custom tensor sequence of length 32 exactly as is used in deepstream-3d-action-recognition for n_sources of RTSP streams [n_sources,3,32,256,256]
  2. A custom tensor that takes the output of a pgie people net model which prepares the tensor based on the output of resnet34_peoplenet_int8.onnx and makes a tensor in the format [total_num_bbox_for_the_batch, [source_id_it_belongs_to, x1, y1, x2, y2]] where x1, y1, x2, y2 represent the bboxes for each person coming from pgie1 (object detection of people net). Another note is that that these are to be taken for only the middle/key frame of the sequence, i.e. frame 16/32, although I have not even attempted to retrieve this frame yet.

Here is a screenshot of the input and output tensor dimensions in netron.app for further details.

I have gotten the first tensor to work with a version of the model without the bboxes, (similar to recognition net i assume but trained by us). I cannot seem to engineer the custom tensor in nvdspreprocess_lib without it failing to load the library, no matter what I try. Here is the code I tried so far in my nvdspreprocess_lib.cpp function:

NvDsPreProcessStatus

PrepareBboxTensor(CustomCtx *ctx, NvDsPreProcessBatch *batch, NvDsPreProcessCustomBuf *&buf,

              CustomTensorParams &tensorParam, NvDsPreProcessAcquirer \*acquirer)

{

NvDsPreProcessStatus status = NVDSPREPROCESS_TENSOR_NOT_READY;

// Acquire buffer from tensor pool

buf = acquirer->acquire();

float \*pDst = (float\*)buf->memory_ptr;

int units = batch->units.size();

int total_objects = 0;

// Iterate over all frames in the batch

guint batch_id = 0;

for (int i = 0; i < units; ++i) {

    GstBuffer \*inbuf = (GstBuffer \*)batch->inbuf;

    NvDsBatchMeta \*batch_meta = gst_buffer_get_nvds_batch_meta(inbuf);

    if (!batch_meta) continue;

    for (NvDsMetaList \*l_frame = batch_meta->frame_meta_list; l_frame != NULL; l_frame = l_frame->next, batch_id++) {

        NvDsFrameMeta \*frame_meta = (NvDsFrameMeta \*)(l_frame->data);

        for (NvDsMetaList \*l_obj = frame_meta->obj_meta_list; l_obj != NULL; l_obj = l_obj->next) {

            NvDsObjectMeta \*obj_meta = (NvDsObjectMeta \*)l_obj->data;

            // Only PGIE objects (optionally filter by class_id if needed)

            float x1 = obj_meta->rect_params.left;

            float y1 = obj_meta->rect_params.top;

            float x2 = obj_meta->rect_params.left + obj_meta->rect_params.width;

            float y2 = obj_meta->rect_params.top + obj_meta->rect_params.height;

            // Write \[batch_id, x1, y1, x2, y2\] to tensor

            pDst\[0\] = (float)batch_id;

            pDst\[1\] = x1;

            pDst\[2\] = y1;

            pDst\[3\] = x2;

            pDst\[4\] = y2;

            pDst += 5;

            total_objects++;

        }

    }

}

status = NVDSPREPROCESS_SUCCESS;

return status;

}

CustomCtx* initLib()

{

CustomCtx\* ctx = new CustomCtx();

// Set tensor parameters for \[n_objects, 5\] (batch_id, x1, y1, x2, y2)

ctx->tensor_num_dims = 2;

ctx->tensor_shape\[0\] = \_MAX_OBJECT_NUM\_; // Maximum number of objects per batch (define as needed)

ctx->tensor_shape\[1\] = 5;                // \[batch_id, x1, y1, x2, y2\]

ctx->tensor_dtype = NVDSPREPROCESS_DT_FLOAT32;

// Calculate total elements and bytes

ctx->tensor_num_elements = ctx->tensor_shape\[0\] \* ctx->tensor_shape\[1\];

ctx->tensor_bytes = ctx->tensor_num_elements \* sizeof(float);

ctx->cpu_tensor = new float\[ctx->tensor_num_elements\];

memset(ctx->cpu_tensor, 0, ctx->tensor_bytes);

return ctx;

}

Here is the corresponding config file:

config_preprocess_stdet.txt (2.0 KB)

I tried to rebuild this into the deepstream-pose-classification app as this is the most similar to mine, just to run the pipeline as follows:

gst_bin_add_many(GST_BIN(pipeline), pgie, tracker, preprocess1, nvtile, nvvidconv, nvosd, sink, nvdslogger, NULL);

The end intended pipeline will be: gst_bin_add_many(GST_BIN(pipeline), preprocess0, pgie, tracker, preprocess1, sgie, nvmsgconv, nvmsgbroker, fakesink, NULL); Here preprocess0 should construct the custom tensor for the sequence (exactly as in deepstream-3d-action-recognition) and preprocess1 the second bbox tensor, which is where it is failing.

I have attached and config files for reference. Every time I run this configuration, I receive the error:

Running…
ERROR from element preprocess-plugin: Could not open custom library

Error details: gstnvdspreprocess.cpp(524): gst_nvdspreprocess_start (): /GstPipeline:deepstream_pose_classfication_app/GstNvDsPreProcess:preprocess-plugin
Returned, stopping playback

Could anyone point me in the right direction of how to build the custom tensor for this? I have tried looking at all of the examples, but I don’t want to modify the ROI itself, I just want the bboxes from the centre image and their batch ids in one tensor.

  1. If this model only needs [n_sources,3,32,256,256] tensor, Please refer to the sample deepstream-3d-action-recognition. the model poseclassificationnet deepstream-pose-classification used is different with your model. it requires a sequence of 34 pose keypoints. In config_preprocess_stdet.txt , please set tensor-name to input_tensor because rois is the name of bboxes layer.
  2. Regarding “ERROR from element preprocess-plugin: Could not open custom library”, nvdspreprocss is opensource. this error means the plugin failed to dlopen the so library. Please make sure the the custom-lib-path is correct. You can add log in initLib() to check if the function run well.

The model needs two tensors, both of them (‘input_tensor’ and ‘rois’), as I specified in the bullet points above.

I already have a first model (custom model) working based on deepstream-3d-action-recognition as I already stated, but its accuracy is much improved with the second tensor, which is why I used a MODIFIED version of deepstream-pose-classification but created my OWN custom tensor functions (without the 34 keypoints). Those functions are in my post above, but they don’t run at all, which is why I was asking for help above. Yes, i check the custom-lib-path, it is correct.

Because I couldn’t get my custom version of the preprocess plugin have modified the second ROI tensor to build in the probe function instead. This worked, but I now get this error:

0:00:39.935994745 752450 0x622ffb8e6200 INFO nvinfer gstnvinfer.cpp:684:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2138> [UID = 1]: serialize cuda engine to file: /opt/nvidia/FSI_Engine/deepstream/models/stdet/stdet_256_256.onnx_b4_gpu0_fp32.engine successfully
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:327 [FullDims Engine Info]: layers num: 3
0 INPUT kFLOAT input_tensor 3x32x256x256 min: 1x3x32x256x256 opt: 4x3x32x256x256 Max: 4x3x32x256x256
1 INPUT kFLOAT rois 5 min: 1x5 opt: 4x5 Max: 4x5
2 OUTPUT kFLOAT cls_score 9 min: 0 opt: 0 Max: 0

0:00:40.095212164 752450 0x622ffb8e6200 WARN nvinfer gstnvinfer.cpp:681:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initNonImageInputLayers() <nvdsinfer_context_impl.cpp:1611> [UID = 1]: More than one input layers but custom initialization function not implemented
0:00:40.095244246 752450 0x622ffb8e6200 ERROR nvinfer gstnvinfer.cpp:678:gst_nvinfer_logger: NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1385> [UID = 1]: Failed to initialize non-image input layers
0:00:40.105791852 752450 0x622ffb8e6200 WARN nvinfer gstnvinfer.cpp:914:gst_nvinfer_start: error: Failed to create NvDsInferContext instance
0:00:40.105938505 752450 0x622ffb8e6200 WARN nvinfer gstnvinfer.cpp:914:gst_nvinfer_start: error: Config file path: ./config_infer_secondary_stdet.txt, NvDsInfer Error: NVDSINFER_CUSTOM_LIB_FAILED

I am not sure where to find information about this custom initialization function. Can you point me in the right direction? Do I need to use nvinferserver instead of nvinfer?

Thanks and best regards,

Portia Murray

If there are more than one input layers, NvDsInferInitializeInputLayers function needs to be implemented. You can refer to the sample cfg and code.

Hi @fanzh Thanks for your response. I implemented a NvDsInferInitalizeInputLayers function as below to initialize the function and it runs, but it is also not working properly (no output when inference with the same test video should yield a result).:

bool NvDsInferInitializeInputLayers(

std::vector<NvDsInferLayerInfo> const &inputLayersInfo,

NvDsInferNetworkInfo const &networkInfo,

unsigned int maxBatchSize)

{

std::vector<float> bbox_data = {

    // \[source_id, x1, y1, x2, y2\] for each object

    0, 0.4, 0.6, 0.6, 0.4,

    0, 0.2, 0.8, 0.4, 0.6

};

for (auto &layer : inputLayersInfo) {

    if (!strcmp(layer.layerName, "rois")) {

        // Make sure the buffer is large enough for your tensor

        memcpy(layer.buffer, rois_data.data(), sizeof(float) \* bbox_data.size());

    }

    else {

        return false;

    }

}

return true;

}

I used the function to initialize the non-image layer (vector of the bboxes). But the other layer is an image layer (custom sequence of images 32 frames long of dimension [n_batch,3,32,width,height]). Is it okay that I only initialize the non-image layer in the above function? Or should I try and initialize the image layer here too?

I am also attaching the rois vector to the buffer via the probe function for the pgie (person detection model) rather than the preprocess function as I could not get the preprocess library to load, even with almost the same code that works in the probe function. Are there any caveats in performance to do it this this way?

why do you add a const bbox for each object in NvDsInferInitializeInputLayers? will the object move in test video? Please add the sequence of images with nvdsprerocess plugin, as written above. nvinfer plugin is opensource. NvDsInferInitializeInputLayers is only called once. hence, different image data can’t be added in for NvDsInferInitializeInputLayers() for each frame.
what do you mean by “I am also attaching the rois vector to the buffer via the probe function”?

nvdspreprocess plugin is opensource. the path is /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvdspreprocess. If the issue still is “Could not open custom library”, you can simplify the code and add logs to narrow down this issue, for example, You can add logs in the start and end of initLib to check if the function run successfully. You can add logs in gst_nvdspreprocess_start to print the error code of dlopen() with “fprintf(stderr, “%s\n”, dlerror());”.

@fanzh Thanks for your response. I was able to build another preprocessing module (in addition to the custom_sequence_process one the same as is used in deepstream -3d-action-recognition example). I got our custom model to work correctly with the pipeline:

    gst_bin_add_many(GST_BIN(pipeline), streammux, queue1, preprocess1, queue2, pgie, queue3, preprocess2, queue4, sgie, queue5, sink, NULL);

preprocess1 is the custom_sequence_process

pgie is the peoplenet model to find the bboxes of humans for preprocess2

preprocess2 is the additional bbox custom tensor I described in my first post on this page.

sgie is our second spatio-temporal custom action model that we built.

sink is fakesink.

In the sgie probe function I checked that it was working by dumping the results to a json and saving screenshots of our actions. It looks fine. For our deployment we want to send the results to our Azure IoT hub using msgconv and msgbroker. We were able to do this with the normal deepstream-3d-action-recognition model and the configs for our msgconv and msgbroker without issue but with the new pipeline it throws an error immediately when linking. The new one is:

gst_bin_add_many(GST_BIN(pipeline), queue1, preprocess, queue2, pgie, queue3, preprocess1, queue4, sgie, tee, queue5, queue6, msgconv, msgbroker, sink, NULL);

But we get the error: “Elements could not be linked. Exiting.” exactly when we run this line. I’ve tried a bunch of different orders of the pipeline but nothing I have tried so far works. Can you recommend to be the correct order here?

Thanks,

Portia Murray

Using the adding order should be fine for linking. could you share a complete log? wondering which elements failed to link. Or you can link one by one. for example, try “preprocess->pgie->sink”, then try “preprocess->pgie->preprocess1->sgie->sink”.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.