We would like to possibly use TensorRT inference Server or TensorRT engine to communicate with our main application on a Windows 10 System. We intend to do inference on multiple camera streams in parallel and therefore require low latency and low Network usage (e.g. no duplicate Image feeding over Network).
Here my Questions:
Is it possible to run TensorRT engine directly on Windows? If so, can we load multiple models on the GPU and send parallel infer requests to Tensor RT engine?
Is it possible to run Tensor RT inference Server on Windows? If there is no support, is it an option to run a virtual machine with ubuntu and communicate to a Tensor RT inference Server process running on the same machine?
Can we expect a low latency for this case? How About latency if, Tensor RT inference Server is running on a dedicated System and we Need to feed Input via http ?
Thank you very much