Frame pointer

Which pointer contains the frame data right that will be inferred on (after preprocess, transformations) Is it fair to assume that the pointer is width x height x network width? Or, is there a parameter that shows the size and explains the structure?

Do you mean you want to know where to find the frame data pointer in the codes of gstnvinfer plugin?

Yes, I am interested in the pointer right before it goes into the the engine for inference. After all the preprocessing and resizing is done and all ready for inference.

The codes are in /opt/nvidia/deepstream/deepstream-5.0/sources/libs/nvdsinfer, the codes will be built into libnvds_infer.so which will be used by gst-nvinfer plugin.

In nvdsinfer_context_impl.cpp, the NvDsInferStatus
NvDsInferContextImpl::queueInputBatch(NvDsInferContextBatchInput &batchInput) is the entrance of pre-processing+inferrence+post-processing. You can find the buffers here.

The dims of the inference image is just the same as model(backend) requires. If you are familiar with tensorRT APIs, you may know the meaning of the buffer format. https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_plugin.html#afff07df37d17d070bab22cc886cda459

In the above: file (/opt/nvidia/deepstream/deepstream-5.0/sources/libs/nvdsinfer/nvdsinfer_context_impl.cpp) you mention in your post, I made the following changes:

After this line:

      std::vector<void*>& bindings = safeRecyleBatch->m_DeviceBuffers;

I load a numpy array of the same size as my network (3 X 224 X 224(=150528) : KFloat)
cnpy::NpyArray arr = cnpy::npy_load(“inputimg.npy”);

//Print to verify its correct
printf(“ARR SIZE %d %d %d%d\n”,arr.num_vals,arr.shape[0],arr.shape[1],arr.shape[2]);

// and load it to a float pointer

float *devbuf = arr.data<float>();

//I then overwrite the bindings[0] (the input array) as following:

cudaMemcpy((float *)bindings[0],devbuf,150528,cudaMemcpyHostToDevice); 

//Assuming above is OK, I don’t get the same result when I run the inference from Python on the same engine with the same input.

I am not familiar with the TensorRT APIs but I believe above should give me the right results.

Why not just use nvinfer as it is?

It doesn’t work as is.

What is your purpose? What kind of model you are using, what is the input and output of the model? What will you do with DeepStream pipeline? Does your model infer on video or image?

I am using the Nvidia open sourced pose model which I have converted into an engine format: https://github.com/NVIDIA-AI-IOT/trt_pose

I have been working non stop to get it working with deepstream with no success. Inference works on everything else - the engine file - but not deepstream. The input is a rgb image (224x224) and the output are 2 bindings in this case the paf and cmap.

Any help is appreciated. Many others are also stuck but have given up. But I still remain optimistic.

For https://github.com/NVIDIA-AI-IOT/trt_pose, please refer to
Pose Estimation on Deepstream