Run Gaze Estimation model on Nvidia Jetson Nano on own data

hahne · March 7, 2022, 9:35am

Description

I have a very simple problem. I want to perform inference on the gaze estimation model from here
https://catalog.ngc.nvidia.com/orgs/nvidia/models/tlt_gazenet
with my own data.

But even after searching a long time how to do that, it is not clear to me.
I can see that there are all these tools to do training or inference like the Transfer Learning Toolkit, TLT computer vision pipeline, TensorRT, JetPack, DeepStreamSDK and some of them seem to run in docker containers. Then there are these conversion tools to convert them from/to .tlt, etlt and .trt and so on.

How does any of these brings me closer to my goal to do inference on the Jetson Nano or if it is just a x68 PC for now?
If you could just let me know if this is possible and if so what is the way to go, that would be great. Thanks!

Environment

TensorRT Version: 8.0
GPU Type: Nvidia Jetson Nano
CUDA Version: 10.2
CUDNN Version: 8.2.1
Operating System + Version: Ubuntu 18.04.6 LTS

NVES · March 7, 2022, 10:07am

Hi,
This looks like a Jetson issue. Please refer to the below samlples in case useful.

For any further assistance, we recommend you to raise it to the respective platform from the below link

Thanks!

hahne · March 7, 2022, 11:28am

Hi,
thank you for the answer but the links you posted have nothing to do with the Transfer Learning Toolkit models from the NGC catalog.

As a first step, it is also not that important that it runs directly on the Jetson, can also run on a normal PC with Nvidia GPU.
I would just like to run inference on the specific model from the NGC catalog I posted. This one here:
https://catalog.ngc.nvidia.com/orgs/nvidia/models/tlt_gazenet

I do not want to run the models from this jetson inference repository and I also do not want to convert a PyTorch model to TensorRT

spolisetty · March 7, 2022, 1:28pm

Hi,

This looks more related to TAO toolkit. We are moving this post to the TAO forum to get better help.

Thank you.

Morganh · March 7, 2022, 3:23pm

There are 3 approaches.

“tao gazenet inference xxx”. See Gaze Estimation - NVIDIA Docs . For this approach, suggest you to run official released notebook as the starting point. This notebook will download public dataset and run training and inference.
This approach runs in x86 PC only.
Run with deepstream. See https://docs.nvidia.com/tao/tao-toolkit/text/deepstream_tao_integration.html#pre-trained-models-bodyposenet-emotionnet-fpenet-gazenet-gesturenet-heartratenet and https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/tree/master/apps/tao_others/deepstream-gaze-app
This approach can run in Nano or X86. But currently there is limitation to visualize the gaze vector. DS team is working on that.
Run with an old inference pipeline. This approach is mentioned in previous version of tao. But you can still use it. You can refer to How to visualise the 3d gaze vector output of the GazeNet model? - #27 by Morganh
This approach can work on X86 or Jetson Xaiver or NX.

hahne · March 8, 2022, 11:34am

Thanks a lot! I tried the second option to run it with deepstream.
So I run the deepstream-gaze-app from

However I get the following error:

Request sink_0 pad from streammux
Now playing: file:///mnt/video_storage/webcam_video.mp4
Library Opened Successfully
Setting custom lib properties # 1
Adding Prop: config-file : …/…/…/configs/gaze_tao/sample_gazenet_model_config.txt
Inside Custom Lib : Setting Prop Key=config-file Value=…/…/…/configs/gaze_tao/sample_gazenet_model_config.txt
0:00:21.552715801 669 0x555fa896f0 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger: NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 2]: deserialized trt engine from :/home/johannes/deepstream/deepstream_tao_apps/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_fp16.engine
INFO: [FullDims Engine Info]: layers num: 4
0 INPUT kFLOAT input_face_images 1x80x80 min: 1x1x80x80 opt: 32x1x80x80 Max: 32x1x80x80
1 OUTPUT kFLOAT conv_keypoints_m80 80x80x80 min: 0 opt: 0 Max: 0
2 OUTPUT kFLOAT softargmax 80x2 min: 0 opt: 0 Max: 0
3 OUTPUT kFLOAT softargmax:1 80 min: 0 opt: 0 Max: 0

ERROR: [TRT]: 3: Cannot find binding of given name: softargmax,softargmax:1,conv_keypoints_m80
0:00:21.606805581 669 0x555fa896f0 WARN nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger: NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1868> [UID = 2]: Could not find output layer ‘softargmax,softargmax:1,conv_keypoints_m80’ in engine
0:00:21.606866051 669 0x555fa896f0 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger: NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 2]: Use deserialized engine model: /home/johannes/deepstream/deepstream_tao_apps/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_fp16.engine

Morganh · March 8, 2022, 12:51pm

How about deepstream_tao_apps/apps/tao_others/deepstream-gaze-app at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub ?

hahne · March 8, 2022, 3:02pm

Sorry, this did not help. Wondering how to fix this error specifically:

ERROR: [TRT]: 3: Cannot find binding of given name: softargmax,softargmax:1,conv_keypoints_m80

Morganh · March 8, 2022, 3:27pm

Please double check according to deepstream_tao_apps/apps/tao_others at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub
And also please try to run other app, for example, Facial Landmarks Estimation .

hahne · March 10, 2022, 5:54pm

I tried running the Facial Landmarks Estimation app and it works on a single image. However, if I input a video, it is super slow. Sometimes it takes a minute for a single frame. If I also try to output the results in an output video, the app just get stuck after 6-7 frames and does not continue even after 10 minutes waiting.

hahne · March 10, 2022, 8:14pm

The gaze estimation works on one specific image now. For all the other images I get a segmentation fault even though they are from the same camera / same size.
But yeah, also in this case, super slow.

DeepStream can generate engine from such models but the implementation of buffer allocation has some problems. So if running the GazeNet sample application without engine, it will fail with core dump for the first time running. The engine will be generated after the first time running. When running the applications again, it will work.

That sounds a bit like it sometimes works and sometimes it doesn’t. Does that mean that this whole software is just not ready for real usage or should their still be an issue on my side?

Morganh · March 11, 2022, 3:03am

There will be no issue if running the applications again.

For the issue you mentioned, could you share the log, command and config file?
More, you are running in Nano. Have you boosted the gpu/cpu clocks?

hahne · March 11, 2022, 1:24pm

So it works now reasonable fast on a video. I had a mistake in one of the config files. Somehow it still does not work on most of the png images. But it is ok for now.

Still I would like to visualize the gaze vector. Is there any deeper issue that prevents you from visualizing this vector or is it just not implemented?
I might just implement it myself.

Morganh · March 11, 2022, 1:58pm

Internal team is working on that. It will be available in future release.

hahne · March 11, 2022, 7:04pm

Internal team is working on that. It will be available in future release.

Adding gaze estimation values as text overlay was at least straight forward.
However, the gaze estimation does not seem to work well with the infrared red images I use. At least it does not react on pupil movements. Face detection and face alignment work fine. Just not on the bright white pupil from the infrared.
That is a pity, but I assume the only way to fix it would be to train it with data from this camera.

Morganh · March 12, 2022, 4:14am

Please train with more training images from target scenario(i.e., your infrared red images) and check if it is better.

system · March 26, 2022, 4:14am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.