Announcing new pre-trained models and general availability of Transfer Learning Toolkit (TLT) 3.0

In this release, we’re introducing

  • A pose-estimation model that supports real-time inference on edge with 9x faster inference performance than the OpenPose model.
  • PeopleSemSegNet, a semantic segmentation network for people detection.
  • A variety of computer vision pretrained models in various industry use cases, such as license plate detection and recognition, heart rate monitoring, emotion recognition, facial landmarks, and more.
  • CitriNet, a new speech-recognition model that is trained on various proprietary domain-specific and open-source datasets.
  • A new Megatron Uncased model for Question Answering, plus many other pretrained models that support speech-to-text, named-entity recognition, punctuation, and text classification.
  • Training support on AWS, GCP, and Azure.
  • Out-of-the-box deployment on NVIDIA Triton and DeepStream SDK for vision AI, and NVIDIA Jarvis for conversational AI.

Download Transfer Learning Toolkit and pre-trained models ( computer vision | conversational AI )

Check out the dev news and the new pose estimation developer blog

Other developer blogs & resources from data generation and data annotation partners:

Thanks so much,
The TLT3.0 is great tool for fine-tuning models without any programming codes, but it has one big drawback.
As you know the online data augmentation is a excellent technique for learning model to get best accuracy and robustness.
In the TLT3.0, There isn’t some of great data augmentations, like translate left/top/down in the training LPRNet and detection models, or shearing, …
If it’s possible add new data augmentations like translate into TLT3.0 as online augmentation.

@LoveNvidia ,
Thanks for using TLT. Please create a new forum topic for discussing.