Intelligent vision and speech-enabled services have now become mainstream, impacting almost every aspect of our everyday life. AI-enabled video and audio analytics are enhancing applications from consumer products to enterprise services. Smart speakers at home. Smart kiosks or chatbots in retail stores. Interactive robots on factory floors. Intelligent patient monitoring systems at hospitals. And autonomous traffic solutions in smart cities. NVIDIA has been at the forefront of inventing technologies that power these services, helping developers create high-performance products with faster time-to-market.
Today, NVIDIA released several production-ready, pre-trained models and a developer preview of Transfer Learning Toolkit (TLT) 3.0. The release includes a collection of new pre-trained models—innovative features that support conversational AI applications—delivering a more powerful solution for accelerating the developer’s journey from training to deployment.
Accelerate Your Vision AI Production
Creating a model from scratch can be daunting and expensive for developers, startups, and enterprises. NVIDIA TLT is the AI toolkit that abstracts away the AI/DL framework complexity and enables you to build production-quality pre-trained models faster, with no coding required.
With TLT, you can bring your own data to fine-tune the model for a specific use case using one of NVIDIA’s multi-purpose, production-quality models for common AI tasks or use one of the 100+ permutations of neural network architectures like ResNet, VGG, FasterRCNN, RetinaNet, and YOLOv3/v4. All the models are readily available from NGC.
Key highlights for pre-trained models and TLT 3.0 (developer preview)
- New vision AI pre-trained models: license plate detection and recognition, heart rate monitoring, gesture recognition, gaze estimation, emotion recognition, face detection, and facial landmarks estimation
- Support for conversational AI use cases with pre-trained models for automatic speech recognition (ASR) and natural language processing (NLP)
- Choice of training with popular network architectures such as EfficientNet, YoloV4, and UNET
- Improved PeopleNet model to detect difficult scenarios such as people sitting down and rotated/warped objects
- TLT launcher for pulling compatible containers to initialize
- Support for NVIDIA Ampere GPUs with third-generation tensor cores for performance boost
Cheers