Dear NVIDIA Developers,
I’m having issues with converting the pose estimation model weights to ONNX format. I’m refering to Step 2 of the blog post that explains how to create a human pose estimation application with DeepStream. If I use the already existing pose_estimation.onnx
model available from the DeepStream Human Pose Estimation GitHub repo, the output from the pose estimation application is somewhat good. However, when I try to convert the model weights to ONNX myself and then edit the configuration file (deepstream_pose_estimation_config.txt
) to use those model weights I converted myself, the output from the pose estimation application becomes so much worse that it’s unusable. In this post, I will share with you all the steps I do in order to convert the model weights to the ONNX format in order for you to be able to re-create the error.
Hadrware information:
Hardware Platform (Jetson / GPU): Tesla K80
DeepStream Version: None needed to reproduce this bug
TensorRT Version: None needed to reproduce this bug
NVIDIA GPU Driver Version (valid for GPU only) 455.32.00
Issue Type( questions, new requirements, bugs) Bugs
How to reproduce the issue ?
Here are the detailed steps to reproduce this bug:
- I first download the Docker image which I will use to convert the model weights to the ONXX model. I download it using the following link: NVIDIA NGC I download the 19.12 version of the PyTorch Docker container with the command:
sudo docker pull nvcr.io/nvidia/pytorch:19.12-py3
. I noticed other versions of this container give me the errorCuda error: no kernel image is available for execution on the device.
- I run the Docker container with the following command:
sudo docker run --gpus all -it -v /home:/home nvcr.io/nvidia/pytorch:19.12-py3
. Now I’m inside the Docker container. - Now I go on to uninstall and install some package, because if I don’t do that I get some errors from
gcc
when trying to buildtorch2trt
. The commands I run are below:
a.pip uninstall torch
b.pip install torch
c.pip uninstall torchvision
d.pip install torchvision
e.pip uninstall tqdm
f.pip install tqdm
g.pip uninstall cython
h.pip install cython
i.pip uninstall pycocotools
j.pip install pycocotools
- I execute the command:
git clone https://github.com/NVIDIA-AI-IOT/torch2trt
- I execute the command:
cd torch2trt
- I execute the command:
python3 setup.py install --plugins
- I exit the
torch2trt
repository by executing the following command:cd ..
- I execute the command:
git clone https://github.com/NVIDIA-AI-IOT/trt_pose.git
- I execute the command:
cd trt_pose
- I execute the command:
python setup.py install
- I execute the command:
cd ..
- I download the model weights from https://drive.google.com/file/d/1XYDdCUdiF2xxx4rznmLb62SdOUZuoNbd/view. I do this by executing the command:
wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1XYDdCUdiF2xxx4rznmLb62SdOUZuoNbd' -O resnet18_baseline_att_224x224_A_epoch_249.pth
- I copy the downloaded model weights to
trt_pose/tasks/human_pose
with the commandcp resnet18_baseline_att_224x224_A_epoch_249.pth ./trt_pose/tasks/human_pose
- I go to
trt_pose/tasks/human_pose
and copy the weights of the model and the filehuman_pose.json
totrt_pose/trt_pose/utils
directory. If I don’t do this, and try to runpython export_for_isaac.py --input_checkpoint ../../tasks/human_pose/resnet18_baseline_att_224x224_A_epoch_249.pth
, I get the following error message:Input model is not specified and can not be inferenced from the name of the checkpoint ../../tasks/human_pose/resnet18_baseline_att_224x224_A_epoch_249.pth. Please specify the model name (trt_pose.models function name).
If I copy just the model weights and not thehuman_pose.json
, I get the error message saying:Input topology human_pose.json is not a valid (.json) file.
. I copy the weights of the model and thehuman_pose.json
file with the commands below:
a.cd trt_pose/tasks/human_pose/
b.cp resnet18_baseline_att_224x224_A_epoch_249.pth human_pose.json ../../trt_pose/utils/
- I go to the directory
trt_pose/trt_pose/utils
with the commandcd ../../trt_pose/utils/
- I execute the command:
chmod +x export_for_isaac.py
- I execute the command:
python export_for_isaac.py --input_checkpoint resnet18_baseline_att_224x224_A_epoch_249.pth
. This is successful. I get the messageSuccessfully completed convertion of resnet18_baseline_att_224x224_A_epoch_249.pth to resnet18_baseline_att_224x224_A_epoch_249.onnx.
. - I copy the resulting file,
resnet18_baseline_att_224x224_A_epoch_249.onnx
to my virtual machine viascp
. - I exit the Docker container by executing the command
exit
.
When I execute all of the steps above, I get the weights in the ONNX format, but as I noted in the introduction, even though the conversion is successful, the output of the pose estimation application is much worse when I use the resnet18_baseline_att_224x224_A_epoch_249.onnx
file I converted following the steps above as opposed to using the pose_estimation.onnx
file in the DeepStream Human Pose Estimation GitHub repo.
What is going on here? What am I doing wrong in the model weights conversion process? How do I fix it?
Best regards