But even after searching a long time how to do that, it is not clear to me.
I can see that there are all these tools to do training or inference like the Transfer Learning Toolkit, TLT computer vision pipeline, TensorRT, JetPack, DeepStreamSDK and some of them seem to run in docker containers. Then there are these conversion tools to convert them from/to .tlt, etlt and .trt and so on.
How does any of these brings me closer to my goal to do inference on the Jetson Nano or if it is just a x68 PC for now?
If you could just let me know if this is possible and if so what is the way to go, that would be great. Thanks!
Hi,
thank you for the answer but the links you posted have nothing to do with the Transfer Learning Toolkit models from the NGC catalog.
As a first step, it is also not that important that it runs directly on the Jetson, can also run on a normal PC with Nvidia GPU.
I would just like to run inference on the specific model from the NGC catalog I posted. This one here: https://catalog.ngc.nvidia.com/orgs/nvidia/models/tlt_gazenet
I do not want to run the models from this jetson inference repository and I also do not want to convert a PyTorch model to TensorRT
“tao gazenet inference xxx”. See Gaze Estimation - NVIDIA Docs . For this approach, suggest you to run official released notebook as the starting point. This notebook will download public dataset and run training and inference.
This approach runs in x86 PC only.
I tried running the Facial Landmarks Estimation app and it works on a single image. However, if I input a video, it is super slow. Sometimes it takes a minute for a single frame. If I also try to output the results in an output video, the app just get stuck after 6-7 frames and does not continue even after 10 minutes waiting.
The gaze estimation works on one specific image now. For all the other images I get a segmentation fault even though they are from the same camera / same size.
But yeah, also in this case, super slow.
DeepStream can generate engine from such models but the implementation of buffer allocation has some problems. So if running the GazeNet sample application without engine, it will fail with core dump for the first time running. The engine will be generated after the first time running. When running the applications again, it will work.
That sounds a bit like it sometimes works and sometimes it doesn’t. Does that mean that this whole software is just not ready for real usage or should their still be an issue on my side?
So it works now reasonable fast on a video. I had a mistake in one of the config files. Somehow it still does not work on most of the png images. But it is ok for now.
Still I would like to visualize the gaze vector. Is there any deeper issue that prevents you from visualizing this vector or is it just not implemented?
I might just implement it myself.
Internal team is working on that. It will be available in future release.
Adding gaze estimation values as text overlay was at least straight forward.
However, the gaze estimation does not seem to work well with the infrared red images I use. At least it does not react on pupil movements. Face detection and face alignment work fine. Just not on the bright white pupil from the infrared.
That is a pity, but I assume the only way to fix it would be to train it with data from this camera.