I am very new to machine learning and GPU programming. I am working on developing an application that uses pre-trained models (.caffemodel, .prototxt, .uff) that I would like to optimize and run real-time using TensorRT. I am getting confused while trying to determine the best method for developing this application.
When I look into TensorRT examples from NVIDIA, I see that it is possible to use TensorRT from Python, using libraries like pyCuda (https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#python_samples_section), however when I look in to the “Hello AI World” tutorial, I see that he creates his own C++ CUDA programs and wraps them to make them available to Python though an API (https://developer.nvidia.com/embedded/twodaystoademo).
For a professional application, what is the best/recommended approach? Developing the whole application in C++, develop partially in C++ and develop higher level network connections in Python, or develop only on Python?
What are good resources to get started not only in machine learning, but making deployable application for the Jetson platform?
On the Jetson platform, the GPU shares memory with the CPU, how does pyCuda allocate memory on the device if the memory is shared?
I know these are a lot of question, and the answer might vary depending on the application, but I would greatly appreciate any help or references which I can use to learn the best way to create deployable machine learning applications