Q&A for the webinar “NVIDIA Pre-trained Models and Transfer Learning Toolkit 3.0 to Create Gesture-based Interactions with a Robot”

TomNVIDIA · March 8, 2021, 4:04pm

Thank you for attending the webinar, “NVIDIA Pre-trained Models and Transfer Learning Toolkit 3.0 to Create Gesture-based Interactions with a Robot”, presented by NVIDIA experts - Ekaterina Sirazitdinova and Nyla Worker. We hope you found it informative.

Please note that the webinar is now available on demand here.

You can also access the github repo here for the sample codes used in this webinar

We received a lot of great questions at the end and weren’t able to respond to all of them. We are consolidating all follow-up questions in the following post.

Can we also retrain the first layers of the network?

In the model config, you can choose to fix weights of batch norm or convolutional blocks in the model config file. More information can be found here- https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/object_detection/detectnet_v2.html?highlight=freeze

Is the quantization error more on the activation function side or does it have a higher impact on the layer size?

QAT affects the values of activation function at each node, and therefore it will affect the layer and model size and computational load. It does not otherwise affect the layer size. Read this developer blog post to learn more about quantization aware training https://developer.nvidia.com/blog/improving-int8-accuracy-using-quantization-aware-training-and-the-transfer-learning-toolkit/

What are the requirements to compile a TRT Model on a host architecture?

For TLT models, you can use tlt-converter to convert from etlt format to TensorRT engine file. Download the appropriate tlt-converter for your hardware and software stack from TLT getting started page.

What is the meaning of numbers in the classification output in the demo overlay?

It is the IDs returned by the tracker.

Is this model file uploaded in the repository?

No, we provide the instructions in the GitHub repository. Models can be downloaded from NGC.

Is the model able to detect gestures as EgoHands - First Person Perspective?

It is able to detect hands from the first perspective, but this will not be ideal as the dataset for gesture recognition was not trained from the first person perspective.

Can the Jetsons support int8 acceleration?

Jetson Xavier NX and AGX Xavier support INT8.

Is this model and repo available in Deepstream python binding?

We only provide the C++ version of the Deepstream app for this example, but the same model can be used with python based app as well.

This method of working with jetson can be used for in-person pose detection?

Yes, check out our developer blog post on pose estimation using Deepstream SDK https://developer.nvidia.com/blog/creating-a-human-pose-estimation-application-with-deepstream-sdk/

Is there an example to extract metadata?

Yes, the reference is provided in the documentation MetaData in the DeepStream SDK — DeepStream 6.1.1 Release documentation

Is there an option to use different pruning techniques in TLT?

No, currently there is only one proprietary pruning method included in TLT.

Does the model detect the hand without a person in the frame?

Yes, the model can detect hand without person in the frame.

Can I create a custom architecture in parallel for many models, for example for face and people, at the same time?

Yes, you can also train a single model with multiple labels. Moreover, the PeopleNet model is already capable of detecting persons, bags and faces.

Can we train a model with live feeds?

Training on live feeds is not supported but you can convert to image and use it as inputs.

Which mAP calculation system is used for your results ? COCO, PascalVOC, another one ?

Pascal VOC 2009 and 2012 VOC. Though this might be model dependent. All the documentation on the metrics and how to configure them are here: https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/object_detection/yolo_v3.html

Is there a way to train a multilabel model?

Yes, you can train multi-label detectors. For example this example detects car, cyclist and pedestrian: https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/object_detection/detectnet_v2.html

Can you count the number of hand gestures and record them into a table or report ? For example, you counted 10 OK signs and 5 Stops?

It is possible, but requires some customizations to the original app.

If we missed any questions, or you have more questions, please feel free to post them in the forum so that we can further assist you.

adventuredaisy · March 13, 2021, 10:12pm

Here is a link to a video that walks through the jupyter notebook (kitti_conversion.ipynb) from the Webinar that is used to prepare the Egohands Dataset:

Topic		Replies	Views
Fast-Tracking Hand Gesture Recognition AI Applications with Pretrained Models from NGC Technical Blog	0	540	April 12, 2021
Sign up New webinar Using NVIDIA Pre-trained models and Transfer Learning Toolkit 3.0 TAO Toolkit	2	428	February 17, 2021
Q&A for the GTC webinar “A21333-Accelerating Vision AI Applications Using NVIDIA Transfer Learning Toolkit and Pre-Trained Models” TAO Toolkit	4	614	October 12, 2021
Transfer Learning Toolkit for Jetson Nano TAO Toolkit	14	1444	October 12, 2021
Webinar: Create Gesture-Based Interactions with a Robot Technical Blog	0	348	August 21, 2022
Sign up New webinar Using NVIDIA Pre-trained models and Transfer Learning Toolkit 3.0 Announcements	0	505	February 19, 2021
Can't get TLT trained model get to work on Deepstream - Jetson (NX) TAO Toolkit	2	915	October 12, 2021
Possibility of QAT training for Jetson devices for yolov4_tiny model with pruned etlt model TAO Toolkit	2	466	May 16, 2023
Transfer Learning Toolkit Detectnet_V2 example walk through TAO Toolkit	2	785	October 12, 2021
NVIDIA Webinar — Embedded Deep Learning with Jetson Jetson TX2	18	7505	January 18, 2017

Q&A for the webinar “NVIDIA Pre-trained Models and Transfer Learning Toolkit 3.0 to Create Gesture-based Interactions with a Robot”

Related topics