Error: no input dimensions given wen trying to deploy gaze detection model

Hi, I am trying to deploy the gaze detection model. I am following the steps explained here TLT CV Inference Pipeline Quick Start Scripts — Transfer Learning Toolkit 3.0 documentation . However, when I run tlt_cv_init.sh I get the follow error: Error: no input dimensions given. This happen in this step: ./tlt_cv_compile.sh gaze $tlt_encode_key_gaze (inside tlt_cv_init.sh file)

I already check the models, model.etlt is in the right directory, however, I did not find the model.plan in the path ${repo_location}/gaze_facegrid_tlt/1/

Can someone help me?

Thanks

I am using Docker.
My Dockerfile is :

syntax=docker/dockerfile:1

FROM ubuntu:18.04

RUN apt-get update
RUN apt-get -y upgrade

RUN apt-get install -y wget
zip
vim

Install docker

RUN apt install -y apt-transport-https ca-certificates curl software-properties-common gnupg-agent
RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
RUN add-apt-repository “deb [arch=amd64] Index of linux/ubuntu/ bionic stable”
RUN apt-get install -y docker-ce docker-ce-cli containerd.io

NGC CLI binary

RUN cd /usr/local/bin && wget -O ngccli_cat_linux.zip https://ngc.nvidia.com/downloads/ngccli_cat_linux.zip && unzip -o ngccli_cat_linux.zip && chmod u+x ngc
RUN --mount=type=secret,id=apisecret,dst=/secret/apisecret.txt cat secret/apisecret.txt | ngc config set

TLT

RUN ngc registry resource download-version “nvidia/tlt_cv_inference_pipeline_quick_start:v0.1-dp”

WORKDIR tlt_cv_inference_pipeline_quick_start_vv0.1-dp

RUN cd scripts && chmod +x *.sh
ENV ENCODING_KEY=nvidia_tlt
##################################
Once I created my image I use:
docker run -v /var/run/docker.sock:/var/run/docker.sock -it jarvis-gaze-model bash
to launch my container. Inside it I got do /tlt_cv_inference_pipeline_quick_start_vv0.1-dp/scripts folder and execute tlt_cv_init.sh. The logs for this last operation are logs of downloading containers and models. There are a lot of logs, I save them in a file, because I exced the characters limit when I tried to paste them here
full_logs.txt (191.8 KB)
)
Before Dowloading I get the following logs

[INFO] Finished pulling containers and models


[INFO] Beginning TensorRT plan compilation with tlt-converter…
[INFO] This may take a few minutes
[INFO] Using this location for models: /root/Downloads/tlt_cv_inference_pipeline_models


[INFO] Compiling Body Pose with key ‘nvidia_tlt’…

=====================
== NVIDIA TensorRT ==

NVIDIA Release 20.11 (build 17147175)

NVIDIA TensorRT 7.2.1 (c) 2016-2020, NVIDIA CORPORATION. All rights reserved.
Container image (c) 2020, NVIDIA CORPORATION. All rights reserved.

https://developer.nvidia.com/tensorrt

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version run /opt/tensorrt/install_opensource.sh.
To build the open source parsers, plugins, and samples for current top-of-tree on master or a different branch, run /opt/tensorrt/install_opensource.sh -b
See GitHub - NVIDIA/TensorRT: TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators. for more information.

Error: no input dimensions given

#############
This specific error happens compiling Body Pose, but the same error and logs happens when I only use the gase estimation

Please share your full commands and full logs.

According to your description, you created your own docker image.
Why did you use

RUN cd /usr/local/bin && wget -O ngccli_cat_linux.zip https://ngc.nvidia.com/downloads/ngccli_cat_linux.zip && unzip -o ngccli_cat_linux.zip && chmod u+x ngc

According to Requirements and Installation — Transfer Learning Toolkit 3.0 documentation
5. Download the [NVIDIA GPU Cloud CLI Tool](https://ngc.nvidia.com/setup/installers/cli).

Please try to the commands in above link to download ngc.

More, actually you need not to build your own docker. You can run inference pipeline in host PC or Jetson devices. Suggest you to follow Requirements and Installation — Transfer Learning Toolkit 3.0 documentation and then TLT CV Inference Pipeline Quick Start Scripts — Transfer Learning Toolkit 3.0 documentation

I installed ngc directly in /usr/bin because I cannot add ngc to PATH using Dockerfile. I think the problem is not in ngc because I can pull the models from NGC.

I installed nvidia-docker, but the error continues.

I followed the steps in the documentation you suggested and the error continues. I also tried to use the host PC and the problem continues.

Do I have to create the model.plan locally? I did not found information about that.

It is opensource for tlt_cv_init.sh.
The error should occurs when run into line 113 of tlt_cv_init.sh
./tlt_cv_compile.sh bodypose $tlt_encode_key_bodypose

and then inside tlt_cv_compile.sh

docker run --rm --runtime nvidia ${GPUS} --name tlt-conversion -v ${models_location}:/models
${image_tlt_cv_server_utils}
tlt-converter -k ${ENCODING_KEY} -t fp16
-p input_1,1x224x320x3,1x224x320x3,2x224x320x3
-e ${repo_location}/bodypose_320x224_tlt/1/model.plan
/models/triton_model_repository/repository/bodypose_320x224_tlt/bodypose.etlt

There is a similar topic Deploying models using tlt_cv_pipeline .

Please check your config.sh.
To narrow down, you can try to login the docker “${image_tlt_cv_server_utils}” and then generate the model.plan via above command.

This issue mostly results from tlt-converter.

Yes, I agree with you. I think the same.

I checked my config.sh and the path for .etlt is ok.

To narrow down, you can try to login the docker “${image_tlt_cv_server_utils}” and then generate the model.plan via above command.

For example,

$ docker run --runtime=nvidia -v /home/morganh/Downloads/tlt_cv_inference_pipeline_models:/models nvcr.io/nvidia/tlt-cv-inference-pipeline:v0.1-dp-server-utilities /bin/bash

=====================
==NVIDIA TensorRT ==
=====================

NVIDIA Release 20.11 (build 17147175)

NVIDIA TensorRT 7.2.1 (c) 2016-2020, NVIDIA CORPORATION. All rights reserved.
Container image (c) 2020, NVIDIA CORPORATION. All rights reserved.

TensorRT SDK | NVIDIA Developer

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version run /opt/tensorrt/install_opensource.sh.
To build the open source parsers, plugins, and samples for current top-of-tree on master or a different branch, run /opt/tensorrt/install_opensource.sh -b
See GitHub - NVIDIA/TensorRT: NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications. for more information.

root@e0a3b0e8a631:/workspace# tlt-converter -k nvidia_tlt -t fp16 -p input_1,1x224x320x3,1x224x320x3,2x224x320x3 -e model_new.plan /models/triton_model_repository/repository/bodypose_320x224_tlt/bodypose.etlt
[INFO] Detected input dimensions from the model: (-1, -1, -1, 3)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 224, 320, 3) for input: input_1
[INFO] Using optimization profile opt shape: (1, 224, 320, 3) for input: input_1
[INFO] Using optimization profile max shape: (2, 224, 320, 3) for input: input_1
[WARNING] Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 2 output network tensors.

I found my problem. My config.sh is ok, however when runing docker run --runtime=nvidia -v /home/dsmendes/Downloads/tlt_cv_inference_pipeline_models:/models nvcr.io/nvidia/tlt-cv-inference-pipeline:v0.1-dp-server-utilities /bin/bash any environment variable is given. Then running:

  • tlt-converter -k ${ENCODING_KEY} -t fp16 \
           -p input_left_images:0,1x1x224x224,1x1x224x224,2x1x224x224 \
           -p input_right_images:0,1x1x224x224,1x1x224x224,2x1x224x224 \
           -p input_face_images:0,1x1x224x224,1x1x224x224,2x1x224x224 \
           -p input_facegrid:0,1x1x625x1,1x1x625x1,2x1x625x1 \
           -e ${repo_location}/gaze_facegrid_tlt/1/model.plan \
           /models/tlt_gazenet_v${tlt_cv_ngc_model_version}/model.etlt 
    

The variabels ${repo_location} and ${tlt_cv_ngc_model_version} were not replaced correctly.