How to run BodyPose3D and POSECLASSIFICATION on one pipeline

저는 Jetson Xavier NX 에서 deepstream을 사용해 Object Detection(YOLOv4)모델로 추론해보았습니다. deepstream-test5를 참고하기도했고 deepstream-app의 도움을 받지 않고 deepstream_reference_app/runtime_source_add_delete를 참고해서 직접 pipeline을 구성해서 구현하기도했습니다.

다음 목표로 Pose Classification을 시도해보는 중이며 환경은 Ubuntu 20.04 LTS / x86_64 / RTX A6000 / 525.78.01입니다. 저는 다음과 같은 절차를 밟았습니다.

NGC BodyPose3d
bodypose3d의 경우 이미 예제코드가 있다는것을 발견했고 여러가지 옵션으로 실험해보았습니다.
github bodypose-3d
하지만 bodypose3d의 추론 결과로 나온 34개의 keypoints 데이터를 특정 자세로 분류하는것까지 원했기에 아래 링크를 확인했습니다.
NGC PoseClassification
그리고 이 Pose Classification모델을 사용해보기 위해 TAO Documentation의 Pose Classification Deploy 섹션을 확인했습니다.
pose_classification deploying
여기에는 제가 이전에 했던 YOLOv4와는 다르게 triton inference server를 사용하는 방법만 나와있었습니다. 어쨋든 triton을 처음 사용해보았고 해당 문서대로 github tao-toolkit-triton-apps를 보고 실행에 성공했고 몇가지를 검토해봤고 제가 이해한 것은 다음과 같습니다.

start_client.sh를 실행하면 bodypose3d app과 필요한 모델들을 download하고 추론 단계에 진입합니다.
이 추론 단계는 다음 세 단계로 구성됩니다.

  1. bodypose3d app을 실행해서 pose data를 json으로 저장합니다.
  2. triton client를 실행하여 어떤 object가 특정 pose인지에 대한 데이터를 results.json으로 받습니다.
  3. 받은 데이터와 원본 영상을 plot_e2e_inference.py를 통해 Pose Classification이 완료된 영상을 만듭니다.

서론이 정말 길었는데 읽어주셔서 감사합니다. 여기서 제가 원하는 것은 이 구조대로라면 반드시 입력은 file 형태가 되어야하는건가요? 제가 원하는 궁극적인 서비스는 rtsp 카메라 소스에서 시작해서 bodypose3d로 34개의 keypoints를 잡아내서 Pose Classification 모델로 자세 분류까지 되는것을 원하며 이를 deepstream의 sink 처럼 rtsp로 내보내거나 kafka 로 전송하기를 원합니다.

제가 궁금한 것을 정리하면

  1. PoseClassification을 수행하려면 triton을 반드시 사용해야하는가? deepstream 파이프라인을 사용할 수는 없나?
  2. deepstream 파이프라인을 사용한다면 bodypose3d 모델과 함께 하나의 파이프라인에서 해결이 가능한 것인가?

입니다. 물론 bodypose3d app에서 이미 PGIE로 peoplenet을 사용하고 SGIE로 bodypose3d를 사용하고있다는것을 알고있습니다. nvinfer element로 추론이 가능한지도 궁금하네요


Sorry but can you put your question in English?

I’ve been told that after this update it’s okay to write in other languages.

If you click the globe icon, you will see it in English as shown below.

I used deepstream in Jetson Xavier NX to infer with the Object Detection (YOLOv4) model. I referenced deepstream-test5 and configured and implemented the pipeline myself by referring to deepstream_reference_app/runtime_source_add_delete without the help of deepstream-app.

I’m trying Pose Classification with the next goal and my environment is Ubuntu 20.04 LTS / x86_64 / RTX A6000 / 525.78.01. I went through the following steps:

NGC BodyPose3d
In the case of bodypose3d, I found that there was already example code and experimented with various options.
github bodypose-3d
However, I even wanted to classify the 34 keypoints data from bodypose3d’s inference into specific postures, so I checked the link below.
NGC PoseClassification
To try out this Pose Classification model, I checked the Pose Classification Deploy section of TAO Documentation.
pose_classification deploying
It only showed how to use the triton inference server, unlike YOLOv4, which I did before. Anyway, I tried triton for the first time, and I saw github tao-toolkit-triton-apps as per the documentation, and it ran successfully, and I reviewed a few things, and here is what I understood:

When you run the, it downloads the bodypose3d app and the necessary models and enters the inference stage.
This inference step consists of three steps:

  1. Run the bodypose3d app and save the pose data as json.
  2. Run the triton client to receive data about which objects are specific poses as results.json.
  3. Create a video that has completed Pose Classification by the received data and the original video.

The introduction was really long, thank you for reading. What I want here is that according to this structure, the input must be in the form of a file? The ultimate service I want is to start from the rtsp camera source, capture 34 keypoints with bodypose3d, and classify them into a pose classification model, and export them to rtsp like a sink in deepstream, or send them to kafka.

To sum up what I’m curious about

  1. Do I have to use triton to perform PoseClassification? Can I use the DeepStream pipeline?
  2. If I use the DeepStream pipeline, is it possible to solve it in one pipeline with the bodypose3d model?

Is. Of course, I know that the bodypose3d app already uses peoplenet as PGIE and bodypose3d as SGIE. I wonder if it is possible to infer with nvinfer element.

I appreciate it.

triton is an inference server, nvinferserver is a deepstream plugin, it will leverage triton to do model inferserver.

what do you mean about “one pipeline with the bodypose3d model”? what is the current pipeline? what is the expecting pipeline?

I’m sorry. I asked vaguely.
deepstream-bodypose-3d The pipeline created by this application looks like this:

After BodyPose3DNet has inferred, I want to add PoseClassificationNet as an nvinfer element to inference on 34 keypoints.

you might add a nvinfer after the first sgie, please refer to deepstream_tao_apps/apps/tao_others/deepstream-emotion-app at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub the pgie is used to detect face, the first sgie is used to get faciallandmark, the second sgie is used to get emotion based on the facial landmark.

How should I write a config file when configuring PoseClassificationNet with nvinfer?
I’ve tried, but it says BatchDims which doesn’t support it.

Here’s the process I tried:

I’m using the deepstream-6.2-devel container in a x86_64/Ubuntu 20.04 environment.

  1. I imported a deepstream_reference_apps containing the bodypose3d app into git clone.

  2. I copied the deepstream-bodypose-3d directory to deepstream-bodypose-3d-classification (cp -r).

  3. Without modifying the code, I built and ran with make from the sources directory to see nothing wrong.

  4. I downloaded the PoseClassification deployable version from NGC

  5. I downloaded tao-converter from NGC as version v4.0.0_trt8.5.2.2_x86.

  6. I converted ETLT file to engine file with tao-converter. The command used at this time is as follows.

tao-converter st-gcn_3dbp_nvidia.etlt \
              -k nvidia_tao \
              -d 3,300,34,1 \
              -p input,1x3x300x34x1,4x3x300x34x1,16x3x300x34x1 \
              -o fc_pred \
              -t fp16 \
              -m 16 \
              -e st-gcn_3dbp_nvidia.etlt_b16_gpu0_fp16.engine
  1. deepstream_pose_estimation_app.cpp modified the code to add a second SGIE to the pipeline.
GstElement* sgie2 = gst_element_factory_make("nvinfer", "secondary-nvinference-engine2");
    if (!sgie2) {
        g_printerr ("Secondary nvinfer could not be created. Exiting.\n");
        return -1;
    //---Set sgie2 properties---
    /* Configure the nvinfer element using the nvinfer config file. */
        "output-tensor-meta", TRUE,
        "input-tensor-meta", TRUE,
        "config-file-path", SGIE2_CONFIG_FILE,

    /* Override the batch-size set in the config file with the number of sources. */
    guint sgie2_batch_size = 0;
    g_object_get(G_OBJECT(sgie2), "batch-size", &sgie2_batch_size, NULL);
    if (sgie2_batch_size < num_sources) {
            ("WARNING: Overriding infer-config batch-size (%d) with number of sources (%d)\n",
            sgie2_batch_size, num_sources);

        g_object_set(G_OBJECT(sgie2), "batch-size", num_sources, NULL);
    nvvideoconvert_enlarge, capsFilter_enlarge,
    pgie, tracker, sgie, sgie2, tee,
    queue_nvvidconv, nvvidconv, nvosd, filesink, nvdslogger,
    nvvideoconvert_reduce, capsFilter_reduce, NULL);

    // Link elements
    if (!gst_element_link_many(streammux_pgie,
        nvvideoconvert_enlarge, capsFilter_enlarge,
        pgie, tracker, sgie, sgie2, nvdslogger, tee, NULL)) 
        g_printerr ("Elements could not be linked. Exiting.\n");
        return -1;
  1. The config file for the second SGIE is written as follows.
## Accuracy mode: _mode0_; Performance mode: _mode1_
## 0=FP32, 1=INT8, 2=FP16 mode
## 0=Detection 1=Classifier 2=Segmentation 100=other
## Integer 0:NCHW 1:NHWC
# Enable tensor metadata output
## 1-Primary  2-Secondary
## 0=RGB 1=BGR 2=GRAY
  1. First of all, at this stage, we checked to see if the model is loaded. I succeeded in building using make, but got the below error on execution.
root@42d3a8fd48e6:/opt/nvidia/deepstream/deepstream-6.2/sources/apps/sample_apps/custom/deepstream_reference_apps/deepstream-bodypose-3d-classification/sources# ./deepstream-pose-estimation-app --input rtsp://<MY_RTSP_SOURCE> --fps --output rtsp://

 *** nv-filesink: Launched RTSP Streaming at rtsp://localhost:8554/ds-test ***

Now playing: rtsp://<MY_RTSP_SOURCE>
WARNING: [TRT]: CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in
WARNING: [TRT]: CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in
0:00:03.289958847  2226 0x564d7abc0b00 INFO                 nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<secondary-nvinference-engine2> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1909> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-6.2/sources/apps/sample_apps/custom/deepstream_reference_apps/deepstream-bodypose-3d-classification/models/poseclassficiationnet_vdeployable_v1.0/st-gcn_3dbp_nvidia.etlt_b16_gpu0_fp16.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [FullDims Engine Info]: layers num: 2
0   INPUT  kFLOAT input           3x300x34x1      min: 1x3x300x34x1    opt: 4x3x300x34x1    Max: 16x3x300x34x1   
1   OUTPUT kFLOAT fc_pred         6               min: 0               opt: 0               Max: 0               

deepstream-pose-estimation-app: nvdsinfer_backend.cpp:135: virtual bool nvdsinfer::TrtBackendContext::canSupportBatchDims(int, const NvDsInferBatchDims&): Assertion `m_AllLayers[bindingIdx].inferDims.numDims == batchDims.dims.numDims' failed.
Aborted (core dumped)

What did I miss?

please refer to tao-converter’s p parameter, are the inputs right?
-p comma separated list of optimization profile shapes in the format <input_name>,<min_shape>,<opt_shape>,<max_shape>, where each shape has x as delimiter, e.g., NxC, NxCxHxW, NxCxDxHxW, etc. Can be specified multiple times if there are multiple input tensors for the model. This argument is only useful in dynamic shape case.

I followed TAO’s documentation.

In fact, when I commented out the input-dims part of PoseClassificationNet’s config file, the pipeline ran without errors.
But now I want to know how I should pass the tensor data output by the bodypose3d model to PoseClassficiationNet.

bodypose3d operates in assistive mode, so it is added to the obj_user_meta_list of NvDsInferTensorMeta as NvDsInferTensorMeta. In what form do I need to add this data to NvDsFrameMeta for PoseClassificaitonNet to work?

here are solutions:

  1. add nvpreprocess plugin before PoseClassificaitonNet,
    use nvpreprocess to save tenor meta, then set nvinfer( PoseClassificaitonNet) 's input-tensor-from-meta to 1, nvinfer( PoseClassificaitonNet) will read the preprocessed meta. please refer to sample opt\nvidia\deepstream\deepstream\sources\apps\sample_apps\deepstream-preprocess-test
    or \opt\nvidia\deepstream\deepstream\sources\apps\sample_apps\deepstream-3d-action-recognition\deepstream_3d_action_recognition.cpp
  2. please refer to deepstream-emotion-app mentioned above, there is a nvdsvideotemplate plugin after sgie faciallandmark, nvdsvideotemplate encapsulates generateing engine, model inference.