Couldn't run CV Inference Pipeline bodyposeNet

• Hardware (T4/V100/Xavier/Nano/etc)
Geforece 1080Ti
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
bodyposeNet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
3.0
nvcr.io/nvidia/tlt-streamanalytics: docker_tag: v3.0-dp-py3
nvcr.io/nvidia/tlt-pytorch: docker_tag: v3.0-dp-py3

I’m trying to run TLT CV Inference Pipeline Sample bodyposeNet on my PC.
I’ve trained bodyposeNet from TLT CV Sample WorkFlows and got .etlt file.
But when I convert .etlt file to TensorRT engine and try to start triton server from TLT CV Inference Pipeline
It said output blob name and shape are not match to config.pbtxt

output blob name and shape from config.pbtxt is
conv2d_transpose_1/BiasAdd:0 with shape [144, 192, 38]
but TensorRT Engine that I convert from script is
paf_out/BiasAdd:0 with shape [36, 48, 38]

Could someone help me check what’s the problem is?

How did you convert .etlt file to TensorRT engine ?

I use tlt_cv_compile.sh from TLT CV Inference Pipeline to convert .etlt to TensorRT Engine

OK, please refer to TLT CV Inference Pipeline Quick Start Scripts — Transfer Learning Toolkit 3.0 documentation to check if it helps for you.

I’ve seen this document and I think it can’t solve my problem.
This chapter talks about how to modify when you change input size which is not what I face.


Above Image is screenshot when I start trition server with tlt_cv_start_server.sh from TLT CV Inference Pipeline.
It said my Engine file output name is differ from config.pbtxt

To narrow down, can you run inference with your tlt model to check if it works?
See
https://docs.nvidia.com/tlt/tlt-user-guide/text/gaze_estimation/gaze_estimation.html#run-inference-on-the-model


Above Picture is output when runing inference with .tlt file


Above Picture is the output from export .etlt file

Can you modify config.pbtxt to retry?

If I rename blob name and shape in both tlt and postprocess configs, I can start trition server sucessfully.
But when I run demo program, both demo and server will shutdown.

Did you have any furthur idea about this problem?

Sorry, what do you mean by demo program and server?

I run tlt_cv_start_server.sh as server and run tlt_cv_start_client.sh as demo client.

OK, so let me make it clear. Please correct me if any.

  1. After you modify config.pbtxt, and then run tlt_cv_start_server.sh, you can start trition server successfully.
  2. Then you run tlt_cv_start_client.sh, there is error. Right?

If yes, please share all the logs of server and client.


This is the log info about triton server


and this is the log info about client

Thanks for the info. I will check if I can reproduce your result.

If possible, please update to the latest tlt docker : v3.0-py3

I find that you are using v3.0-dp-py3 version. It is not the latest tlt docker.

See TLT Quick Start Guide — Transfer Learning Toolkit 3.0 documentation
pip3 install --upgrade nvidia-tlt

Sorry, I found that I gave you wrong log info.
Here is log info


Hello Morganh,
Could you reproduce my problem?

Sorry, I have not tried yet. As mention above, could you update to latest TLT 3.0-py3 docker and run several epochs and try again?

I’ve found what’s the problem.
I didn’t export .etlt with --sdk_compatible_model as the last section in tlt cv samples.
I can run demo from tlt cv inference pipeline samples now.

I’m now wondering how to run it with python.
Did I need to do any preprocess before send data to triton server?