Conversion of model weights for human pose estimation model to ONNX results in nonsensical pose estimation

Dear NVIDIA Developers,

I’m having issues with converting the pose estimation model weights to ONNX format. I’m refering to Step 2 of the blog post that explains how to create a human pose estimation application with DeepStream. If I use the already existing pose_estimation.onnx model available from the DeepStream Human Pose Estimation GitHub repo, the output from the pose estimation application is somewhat good. However, when I try to convert the model weights to ONNX myself and then edit the configuration file (deepstream_pose_estimation_config.txt) to use those model weights I converted myself, the output from the pose estimation application becomes so much worse that it’s unusable. In this post, I will share with you all the steps I do in order to convert the model weights to the ONNX format in order for you to be able to re-create the error.

Hadrware information:

Hardware Platform (Jetson / GPU): Tesla K80
DeepStream Version: None needed to reproduce this bug
TensorRT Version: None needed to reproduce this bug
NVIDIA GPU Driver Version (valid for GPU only) 455.32.00
Issue Type( questions, new requirements, bugs) Bugs

How to reproduce the issue ?

Here are the detailed steps to reproduce this bug:

  1. I first download the Docker image which I will use to convert the model weights to the ONXX model. I download it using the following link:; I download the 19.12 version of the PyTorch Docker container with the command: sudo docker pull I noticed other versions of this container give me the error Cuda error: no kernel image is available for execution on the device.
  2. I run the Docker container with the following command: sudo docker run --gpus all -it -v /home:/home Now I’m inside the Docker container.
  3. Now I go on to uninstall and install some package, because if I don’t do that I get some errors from gcc when trying to build torch2trt. The commands I run are below:
    a. pip uninstall torch
    b. pip install torch
    c. pip uninstall torchvision
    d. pip install torchvision
    e. pip uninstall tqdm
    f. pip install tqdm
    g. pip uninstall cython
    h. pip install cython
    i. pip uninstall pycocotools
    j. pip install pycocotools
  4. I execute the command: git clone
  5. I execute the command: cd torch2trt
  6. I execute the command: python3 install --plugins
  7. I exit the torch2trt repository by executing the following command: cd ..
  8. I execute the command: git clone
  9. I execute the command: cd trt_pose
  10. I execute the command: python install
  11. I execute the command: cd ..
  12. I download the model weights from I do this by executing the command: wget --no-check-certificate '' -O resnet18_baseline_att_224x224_A_epoch_249.pth
  13. I copy the downloaded model weights to trt_pose/tasks/human_pose with the command cp resnet18_baseline_att_224x224_A_epoch_249.pth ./trt_pose/tasks/human_pose
  14. I go to trt_pose/tasks/human_pose and copy the weights of the model and the file human_pose.json to trt_pose/trt_pose/utils directory. If I don’t do this, and try to run python --input_checkpoint ../../tasks/human_pose/resnet18_baseline_att_224x224_A_epoch_249.pth, I get the following error message: Input model is not specified and can not be inferenced from the name of the checkpoint ../../tasks/human_pose/resnet18_baseline_att_224x224_A_epoch_249.pth. Please specify the model name (trt_pose.models function name). If I copy just the model weights and not the human_pose.json, I get the error message saying: Input topology human_pose.json is not a valid (.json) file.. I copy the weights of the model and the human_pose.json file with the commands below:
    a. cd trt_pose/tasks/human_pose/
    b. cp resnet18_baseline_att_224x224_A_epoch_249.pth human_pose.json ../../trt_pose/utils/
  15. I go to the directory trt_pose/trt_pose/utils with the command cd ../../trt_pose/utils/
  16. I execute the command: chmod +x
  17. I execute the command: python --input_checkpoint resnet18_baseline_att_224x224_A_epoch_249.pth. This is successful. I get the message Successfully completed convertion of resnet18_baseline_att_224x224_A_epoch_249.pth to resnet18_baseline_att_224x224_A_epoch_249.onnx..
  18. I copy the resulting file, resnet18_baseline_att_224x224_A_epoch_249.onnx to my virtual machine via scp.
  19. I exit the Docker container by executing the command exit.

When I execute all of the steps above, I get the weights in the ONNX format, but as I noted in the introduction, even though the conversion is successful, the output of the pose estimation application is much worse when I use the resnet18_baseline_att_224x224_A_epoch_249.onnx file I converted following the steps above as opposed to using the pose_estimation.onnx file in the DeepStream Human Pose Estimation GitHub repo.

What is going on here? What am I doing wrong in the model weights conversion process? How do I fix it?

Best regards


The step you shared looks correct to us.
A possible issue is that there is a pre-generated engine file, which prevents Deepstream from recompiling the model from the onnx file.

Would you mind to check if there is any file named as *.engine?
If yes, please delete the file and rerun the pipeline.


Hello @AastaLLL,

I tried doing what you suggested when running the pose estimation app. It doesn’t help with fixing the problem. The .engine file is not there before I run the app for the first time and even if I delete it after running the app the nonsensical output is still present. To remind you, the problem is somewhere in the weight conversion process from .pth to .onnx because if I use the default pose_estimation.onnx file available from the GitHub repository of the pose estimation app I get good results.

I also tried converting the model weights from .pth to .onnx by keeping all the files in trt_pose/tasks/human_pose directory and running the script with the following arguments: --input_checkpoint ../../tasks/human_pose/resnet18_baseline_att_224x224_A_epoch_249.pth --input_model resnet18_baseline --input_width 224 --input_height 224 --input_topology ../../tasks/human_pose/human_pose.json

Then I had some errors while loading state_dict, but I fixed them by changing line 119 of script from:



model.load_state_dict(torch.load(args.input_checkpoint), strict=False)

This method also produces nonsensical (unusable) results.

So, to recap, I tried both the method for converting the weights from my first post in this thread and the other method for converting the weights I just described . Alongside them, I tried deleting the .engine file when it existed and I also tried converting both the resnet and the densenet model weights. None worked.

I also have the same problem like u , and i also have no idea about to solve it . Have u sovle this problem?

Not as of now. I’ve been busy with other stuff. Tagging @AastaLLL to see if he can chip in on the solution.

Hi, both

Guess this issue occurs due to the different model architecture.
For the different pose estimation models, the output may have different semantic meanings.
And you will need to update the parser accordingly.

For example, you can find the default parser and underlying function below:

The default model architecture represents in this script:


Hey @AastaLLL,

I am using the default resnet18_baseline_att model architecture. I download the weights from the trt_pose GitHub repository. So I don’t know why I’m getting this error, since my model architecture is default.