Q&A for the GTC webinar “A21333-Accelerating Vision AI Applications Using NVIDIA Transfer Learning Toolkit and Pre-Trained Models”

Thank you for attending the GTC webinar, “Accelerating Vision AI Applications Using NVIDIA Transfer Learning Toolkit and Pre-Trained Models”, presented by Chintan Shah. We hope you found it informative. We received a lot of great questions at the end and weren’t able to respond to all of them. We are consolidating all follow-up questions in the following post.

  1. Is TLT only Tensorflow based or can Pytorch be used too?

TLT runs TensorFlow to train under the hood. PyTorch is not supported at the moment but it’s on our roadmap for future.

  1. Do you have some suggestions regarding the deployment of the models which are trained on RTX2070 using a container (e.g. Docker)? Does NVIDIA recommend any particular container?

TLT models can be deployed using DeepStream SDK. DeepStream containers are available on NGC, NVIDIA GPU cloud registry

  1. Could you explain what “Batch Size” actually is?

Batch size is the number of images to send in parallel for training. Since you cannot send the entire dataset for training at once due to memory and compute restriction, you divide the dataset into batches before sending for training.

  1. Does TLT assume images in common formats (jpeg, png etc.) or can a custom loader be used?

Yes. TLT assumes images to be in .jpg and .png format. For any other formats, it will return an error.

  1. Can we deploy these models on a local web server using DeepStream?

Use NVIDIA Triton inference server to deploy the model on local web server

  1. Can we get the trained and pruned model in ONNX or other format that we can import into PyTorch or Tensorflow for experimenting with?

TLT doesn’t provide checkpoints in Tensorflow or PyTorch. TLT models are exported as .etlt file that can be used for deployment with DeepStream or TensorRT

  1. You mentioned that entire instructions/code available on Github to repeat this mask detector. Can you please post the link?

Please check https://github.com/NVIDIA-AI-IOT/face-mask-detection

  1. Is the model in the demo for only mask detection?

Yes the model in the demo is only for face mask detection but you can use TLT for training for other vision applications

  1. With GCP VMs without xhost display, would we be able to train ? With no UX can we train tlt models

Yes. tlt-train doesn’t require you to have a UX/display.

  1. When does TLT support yolov4?

It’s on our roadmap

  1. Does TLT run on (or is compatible with) IBM power processors?

TLT only works on x86 CPU with a NVIDIA GPU

  1. What are the minimum hardware requirements (GPU) for TLT ?

See the requirement and installation section of TLT user guide - https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html#requirements

  1. How to accelerate MTL(Multi-Task Learning) in computer vision ?

MTL is often used in general machine learning including non-vision tasks. Transfer Learning is related to MTL, and TLT is currently focused on computer vision tasks. TLT is an accelerated package including multi-GPU training.

  1. Can you run custom models with DeepStream?

See DeepStream development guide to run custom models - https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream_Development_Guide/deepstream_custom_model.html#

  1. How do the pre-defined models from NVIDIA TLT compare to other available platforms, like Facebook Detectron/Detectron2 or Google Object Detection API, in terms of accuracy, footprint, and run-time?

We have not specifically compared TLT models with Detectron/Detectron2 models. TLT pretrained models are trained on Google OpenImages. In general, models optimized using TLT can run faster than unpruned models. TLT users can trade-off a model’s accuracy, foot-print and runtime by choosing the extent of pruning.

  1. Does TLT provide any python API to automate the workflow.

Not at the moment

  1. Does a configuration support to use both GPU and NDLA of Jetson Xavier family at the same time?

Yes GPU and DLA are supported concurrently with DeepStream. See DeepStream development guide - https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html

  1. How did you arrive at 960x544 resolution as the best resolution ?

This number was derived empirically as a function of our dataset object size distribution, inference latency requirement and accuracy. At smaller resolutions like 480x272 or 640x368 objects with a pixel area of 25 sq.pixels to 100sq.pixels were hard to detect, since these objects would disappear in the final feature map after a stride of 16.

  1. Can TLT be run on Jetson devices ?

Model training with TLT can only be done on x86 with NVIDIA GPU. Models can be deployed on Jetson

  1. Can the pre-trained models be commercially used directly?

Yes. Please read the terms in our EULA - https://developer.nvidia.com/tlt-20-models-eula

  1. Can you share an example with nuScenes dataset with TLT?

We don’t have an example with nuScenes dataset with TLT

You can watch the recording and download the presentation slides from the following link.

For the response to the follow-up questions related to DeepStream GTC talk “ A21337 Implementing Real-time Vision AI Apps Using NVIDIA DeepStream SDK”, please visit the DeepStream SDK forum.

If you have more questions, please feel free to post your questions in the forum so that we can further assist you