Run Gaze Estimation model on Nvidia Jetson Nano on own data


I have a very simple problem. I want to perform inference on the gaze estimation model from here
with my own data.

But even after searching a long time how to do that, it is not clear to me.
I can see that there are all these tools to do training or inference like the Transfer Learning Toolkit, TLT computer vision pipeline, TensorRT, JetPack, DeepStreamSDK and some of them seem to run in docker containers. Then there are these conversion tools to convert them from/to .tlt, etlt and .trt and so on.

How does any of these brings me closer to my goal to do inference on the Jetson Nano or if it is just a x68 PC for now?
If you could just let me know if this is possible and if so what is the way to go, that would be great. Thanks!


TensorRT Version: 8.0
GPU Type: Nvidia Jetson Nano
CUDA Version: 10.2
CUDNN Version: 8.2.1
Operating System + Version: Ubuntu 18.04.6 LTS

This looks like a Jetson issue. Please refer to the below samlples in case useful.

For any further assistance, we recommend you to raise it to the respective platform from the below link


thank you for the answer but the links you posted have nothing to do with the Transfer Learning Toolkit models from the NGC catalog.

As a first step, it is also not that important that it runs directly on the Jetson, can also run on a normal PC with Nvidia GPU.
I would just like to run inference on the specific model from the NGC catalog I posted. This one here:

I do not want to run the models from this jetson inference repository and I also do not want to convert a PyTorch model to TensorRT


This looks more related to TAO toolkit. We are moving this post to the TAO forum to get better help.

Thank you.

1 Like

There are 3 approaches.

Thanks a lot! I tried the second option to run it with deepstream.
So I run the deepstream-gaze-app from

However I get the following error:

Request sink_0 pad from streammux
Now playing: file:///mnt/video_storage/webcam_video.mp4
Library Opened Successfully
Setting custom lib properties # 1
Adding Prop: config-file : …/…/…/configs/gaze_tao/sample_gazenet_model_config.txt
Inside Custom Lib : Setting Prop Key=config-file Value=…/…/…/configs/gaze_tao/sample_gazenet_model_config.txt
0:00:21.552715801 669 0x555fa896f0 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger: NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 2]: deserialized trt engine from :/home/johannes/deepstream/deepstream_tao_apps/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_fp16.engine
INFO: [FullDims Engine Info]: layers num: 4
0 INPUT kFLOAT input_face_images 1x80x80 min: 1x1x80x80 opt: 32x1x80x80 Max: 32x1x80x80
1 OUTPUT kFLOAT conv_keypoints_m80 80x80x80 min: 0 opt: 0 Max: 0
2 OUTPUT kFLOAT softargmax 80x2 min: 0 opt: 0 Max: 0
3 OUTPUT kFLOAT softargmax:1 80 min: 0 opt: 0 Max: 0

ERROR: [TRT]: 3: Cannot find binding of given name: softargmax,softargmax:1,conv_keypoints_m80
0:00:21.606805581 669 0x555fa896f0 WARN nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger: NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1868> [UID = 2]: Could not find output layer ‘softargmax,softargmax:1,conv_keypoints_m80’ in engine
0:00:21.606866051 669 0x555fa896f0 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger: NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 2]: Use deserialized engine model: /home/johannes/deepstream/deepstream_tao_apps/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_fp16.engine

How about deepstream_tao_apps/apps/tao_others/deepstream-gaze-app at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub ?

Sorry, this did not help. Wondering how to fix this error specifically:

ERROR: [TRT]: 3: Cannot find binding of given name: softargmax,softargmax:1,conv_keypoints_m80

Please double check according to deepstream_tao_apps/apps/tao_others at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub
And also please try to run other app, for example, Facial Landmarks Estimation .

I tried running the Facial Landmarks Estimation app and it works on a single image. However, if I input a video, it is super slow. Sometimes it takes a minute for a single frame. If I also try to output the results in an output video, the app just get stuck after 6-7 frames and does not continue even after 10 minutes waiting.

The gaze estimation works on one specific image now. For all the other images I get a segmentation fault even though they are from the same camera / same size.
But yeah, also in this case, super slow.

DeepStream can generate engine from such models but the implementation of buffer allocation has some problems. So if running the GazeNet sample application without engine, it will fail with core dump for the first time running. The engine will be generated after the first time running. When running the applications again, it will work.

That sounds a bit like it sometimes works and sometimes it doesn’t. Does that mean that this whole software is just not ready for real usage or should their still be an issue on my side?

There will be no issue if running the applications again.

For the issue you mentioned, could you share the log, command and config file?
More, you are running in Nano. Have you boosted the gpu/cpu clocks?

So it works now reasonable fast on a video. I had a mistake in one of the config files. Somehow it still does not work on most of the png images. But it is ok for now.

Still I would like to visualize the gaze vector. Is there any deeper issue that prevents you from visualizing this vector or is it just not implemented?
I might just implement it myself.

Internal team is working on that. It will be available in future release.

Internal team is working on that. It will be available in future release.

Adding gaze estimation values as text overlay was at least straight forward.
However, the gaze estimation does not seem to work well with the infrared red images I use. At least it does not react on pupil movements. Face detection and face alignment work fine. Just not on the bright white pupil from the infrared.
That is a pity, but I assume the only way to fix it would be to train it with data from this camera.

Please train with more training images from target scenario(i.e., your infrared red images) and check if it is better.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.