Pose estimation: change input from videos to images

Please provide complete information as applicable to your setup.

• Hardware Platform (GTX 1660)
• DeepStream Version 5.0
• TensorRT Version 7.0.0.11
• NVIDIA GPU Driver Version 440.33.01
• Issue Type: question

Hello,

I was trying to test out the pose estimation app (GitHub - NVIDIA-AI-IOT/deepstream_pose_estimation: This is a sample DeepStream application to demonstrate a human pose estimation pipeline.) on a desktop PC running Ubuntu 18.04 but it fails every time because of the h264 parser, it gives the following error
h264-parser: No valid frames found before end of stream

I tried to use videos encoded with an H264 encoder as well as using videos provided by deepstream sdk with it’s samples, all having the same outcome.

The real problem I am facing is not using the app with video streams as a source, but with images instead. My question is, could any model (densene121 or resnet18) be used with images to output a skeleton or they are made to work only with videos? and if it’s possible, how can i change the pipeline to load images instead of videos?

I tried to replace the h264 parser and nvv4l2-decoder with a jpeg one but the outcome was an out of bounds radius error.

Any help is greatly appreciated.

What kind of images? JPG files or PNG files?

Deepstream SDK has provided sample codes of JPG file source in deepstream-image-decode-test.

Ok, i’ll check how the sample works. Thank you for the hint.

Do you know what can i use to replace the h264 encoder with a jpeg encoder? For some reason if i don’t use an encoder, the app gets stuck in an infinite loop without giving any results nor exit, and if i use the default h264 encoder it fails to run.

I changed the original pipeline so i could use images as input. The pipeline i’m currently using is:

gst_bin_add_many(GST_BIN(pipeline),
                   streammux, source, jpegparser, decoder, pgie,
                   nvvidconv, nvosd, sink, NULL);

but this pipeline is generating another error:

Error details: gstnvinfer.cpp(1975): gst_nvinfer_output_loop (): /GstPipeline:deepstream-tensorrt-openpose-pipeline/GstNvInfer:primary-nvinference-engine:
streaming stopped, reason not-negotiated (-4)

Also when i checked if the detection worked, the values inside the debugger were showing that it wasn’t the case.
I used the same image as an input for the sample deepstream-image-decode-test and it had no issues.
Could anyone please help me figure out what i’m doing wrong?

I managed to figure out what was causing the streaming error, it was a libEGL problem which was solved by updating the driver, but the problem remains as the model fails to give any relevant output, this time no errors are shown in the console.

How do you know that it is the model who fails?

I changed the input to be the live feed from a usb camera and when it renders the circles it’s just a mess, multiple circles are rendered in the same place, but never on the actual person. This is the most accurate detection i got from this app

Is it correct with the original deepstream_pose_estimation on your platform?

I followed the instructions for dGPU platform, i changed the .so files with the ones from the repository, i downloaded the models and converted them using the utility tool from trt_pose repository, and i changed the source to be the camera feed rather than a h264 video. Also the h264 parser always had an error no matter what video i tried to use, even videos from other samples.

I need to confirm whether the original deepstream_pose_estimation can work on your platform.

So there is a possibility that the app can’t work on my system?

The original deepstream_pose_estimation has been tested and then released, it should work. If it can not work, there may be configuration or other problem with your platform, we need to fix such problem first.

I used the default config.txt file, i only changed the onnx model, the current config looks like this:

[property]
gpu-id=0
net-scale-factor=0.0174292
offsets=123.675;116.28;103.53
onnx-file=densenet121_baseline_att_256x256_B_epoch_160.onnx
labelfile-path=labels.txt
batch-size=1
process-mode=1
model-color-format=0
network-mode=2
num-detected-classes=4
interval=0
gie-unique-id=1
model-engine-file=densenet121_baseline_att_256x256_B_epoch_160.onnx_b1_gpu0_fp16.engine
network-type=100
workspace-size=3000
output-tensor-meta=true

Seems the postprocessing output wrong coordinates. You need to check your model, if it is not the same to the sample model in GitHub - NVIDIA-AI-IOT/deepstream_pose_estimation: This is a sample DeepStream application to demonstrate a human pose estimation pipeline., you can not use the postprocessing in the code.

Thank you for your help. My mistake was that i used resnet/densenet models without changing anything, and i never tried using the default model. I tried it now and it works.