We have trained models using TAO classification and detection pipelines. For our use case we want to deploy the models into Azure and AWS functions and run inference on the CPU.
Before working with the TAO, we would have just used Keras or TF for training, and then loaded the saved model from the Azure/AWS function runtime and done inference via a light-weight python script.
How can we accomplish the same thing (CPU inference, minimal dependency python script that we can host in the AWS/Azure function runtime), when we start from a .tlt model?
@Morganh - question on those lines. It looks like both require a GPU backend for executing inference. Do you have a pointer to a CPU only single python script (no triton or deepstream) with vanilla TF that can load up the graph and run inference?
There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks