NVIDIA Webinar — Embedded Deep Learning with Jetson

Embedded Deep Learning with NVIDIA Jetson
On Demand (1 hour)

Recently released JetPack 2.3 includes improved performance in deep learning. And last week at GTC Europe, the latest GPU-equipped deep-learning robotic technology was unveiled. Curious how to deploy neural networks? Start developing applications with advanced AI and computer vision today using NVIDIA’s high-performance deep learning tools. Join us for this online webinar being held on 10/12, where we’ll share and discuss:

  • How to use NVIDIA’s deep learning tools, such as DIGITS and TensorRT.
  • Various types of neural network-based primitives available as a building blocks, deployable onboard intelligent robots and drones using NVIDIA’s Jetson Embedded Platform.
  • Realtime deep-learning solutions for image recognition, object localization, and segmentation. (GitHub)
  • Training workflows for customizing network models with new training datasets and emerging approaches to automation like deep reinforcement learning and simulation

Register and view on demand — http://info.nvidianews.com/embedded-deep-learning-nvidia-jetson.html
Watch the recording here — https://www.youtube.com/watch?v=_4tzlXPQWb8&t=4s

Thanks everyone for joining us for the webinar earlier today! And for those of you who would like to watch at another time, the recording is available. To access the slides, you can find them located here:

https://github.com/dusty-nv/jetson-presentations/raw/master/20161012_Embedded_Deep_Learning.pdf

Thanks for the webinar dusty!

Thanks for the Webinar @dusty_nv
I could hear the audio, but no visuals yesterday.
Checking it now.

P.S.:
The presenter said an email will be sent after the webinar, but that didn’t arrive either.
Is that a problem at my end? Did anyone else receive emails on this?
I am getting all other emails so not the problem with filter.

Hello Dustin,

Thanks for your presentation.
I am testing TensorRT and your demos available on github.
I would like to ask you two questions related to my TensorRT use.

1/ TensorRT supported layers
You explained that any caffe prototxt can be converted to TensorRT format for fast inference.
You mentionned TensorNet class can do this conversion for imagetNet or detectNet net types.
How is it possible to do it for my own designed nets since it seems not all types of layers are supported for this conversion ? What are the exact list of supported layers ? Will this list be extended ?

2/ FP16 usage
As I did not manage to convert my network and get all the benefits of TensorRT optimizations, I wanted at least to use FP16 instead of FP32 and figure out how much speed improvement I could get. Unfortunately, using the nvcaffe fp16 experimental branch, I got the exact same problem since this branch is not maintained and recent layers such as dilated convolutions are not supported. Is there any up to date fp16 compatible caffe version available somewhere ? If not, what is the methodology to make my layers FP16 compatible ?

Thanks in adavnce for your answers.
Alex

Hi Alex, the exact list of supported layers is from the TensorRT documentation - to obtain it, you can either download JetPack 2.3 to a host PC and it will download the packages to a folder underneath JetPack 2.3 that you can unzip and read the docs. Or you can get the desktop version through the NVIDIA website. Offhand here are some supported layer types, with more being added in the future in addition to support for adding custom layers:

  • Convolution: 2D
  • Activation: ReLU, tanh and sigmoid
  • Pooling
  • ElementWise
  • LRN
  • Fully-connected
  • SoftMax
  • Deconvolution
  • In the future the nvcaffe/fp16_experimental branch should be updated, however experimenting with latest layers and FP16 support today, I tend to use
    Torch on TX1 which includes working FP16 in cutorch and the latest cudnn 5.1 bindings. Regarding the latest Caffe layers, they would need to be optimized for FP16 in a similar fashion to the existing layers in nvcaffe/fp16_experimental. My understanding is Torch somewhat skirts the need for per-layer FP16 optimizations, because the underlying tensor operations in Torch have now already been accelerated with FP16. Hope that helps!

    Thanks for reporting the issue ShaktiD, sorry about that. If the recording isn’t working for you either, let us know. Link is below:
    http://on-demand.gputechconf.com/gtc/2016/webinar/embedded-deep-learning-nvidia-jetson.mp4

    I’m not sure the official recording notification e-mail has been sent yet, however you can find the link posted above.

    The downloads (mp4 and pdf) work.
    Thanks.

    Still getting my head around it. :-)

    Hello again !
    Thanks for your quick answer.
    I will have a deeper look in the Tensor RT documentation.
    Can you give me the exact path to acces this documentation on my host PC ?
    Last question, do you have any idea when custom layers will be supported ?
    Alex

    On your host PC, you would either get the host version or if you ran JetPack 2.3 on your host, it would download the GIE/TensorRT package for ARM and you could retrieve it from there. Not sure when the next version would be out, but since TensorRT 1.0 allows you to bind to any output blobs, in theory you could intercept the data and run the custom layers elsewhere (i.e. caffe fp32)

    Hello dusty_nv,

    How can I use TensorRT without nvcaffepasser ? In my application, caffemodel and prototxt are encrypted.
    Should I use INetworkDefinition Class to define my network builder from encrypted file ?

    thanks.

    Renbo

    Hi Renbo, yes you should be able to do something of the like — TensorRT includes C/C++ interfaces for configuring layers in addition to the caffeparser. I believe you may also be able to pass in a string to nvcaffeparser that was decrypted in memory by your application.

    Thank you @dusty_nv for the webinar.

    I checked out nvidia.com/DLI for the Deep Learning Institute, and couldn’t find much there aside from the Udacity course. Are there links you have to the other self-paced courses and hands-on labs mentioned?

    I would love some links to a tutorial on DIGITS, and how to get started to train object recognition - similar to the TX1 robot that would find bananas and avoid apples.

    Thank you!

    Doug

    Hi Doug, here’s a link to Self-Paced Online Courses with QwikLabs. They are available for a bunch of different frameworks including caffe and DIGITS.

    Also, work through the DIGITS training docs and examples at http://github.com/NVIDIA/DIGITS.

    Thank you very much @dusty_nv !!

    I WAS very excited to sign up for the “Introduction to Deep Learning Quest” on QwikLabs.

    However, I was disappointed when I logged in today and found that the quest has disappeared from the site, along with the labs on DIGITS, Cafe, Theano, and Torch.

    Any idea who can help, or explain why these courses were pulled? The links for “Hands On Lab” work, but when clicking “Select Lab” within the page, QwikLabs doesn’t show the course.

    Thank you,
    Doug

    It looks like the Introduction to Deep Learning lab is still up, I will try to find out about the others.

    For the time being, I recommend working through the DIGITS examples on GitHub or Caffe ImageNet tutorial.

    Hi @dusty_nv. I saw your Deep Learning with Jetson talk at the GTC DC developers conference yesterday. Fantastic presentation…one of the best there.

    Hello Dustin,

    I have succesfully trained a convolutional model for classifying small images (64x64). I now want to deploy it on my Jetson TX1 using TensorRT, taking as input bigger images, slicing them and feeding the network with batches of 64x64 images. I saw your code on github, thank you very much for that, it’s really helpful.

    Sadly, I’m not really into CUDA and I was wondering if you have some example code on how to run inference using batches of images or maybe point out the way to achieve this by editing portions of your code.

    Thanks in advance,
    Martín

    Hi Martin, thanks for the suggestion, I will plan on adding support for batches starting with imageNet. Also I am adding an API to imageNet which returns the top N outputs instead of just the maximum output. I’ll post it here when the patch for batch processing has been committed.