There are two inference components in the Deepstream: nvinfer and nvinferserver.
nvinfer is implemented with the TensorRT.
It only supports the TensorRT engine and the model format that can be converted into TensorRT.
nvinferserver use the Triton server. There are lots of different backends that are supported.
But unfortunately, Triton doesn’t support PyTorch on Jetson yet.
You can find more details below:
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinferserver.html The plugin supports Triton features along with multiple deep-learning frameworks such as TensorRT, TensorFlow (GraphDef / SavedModel), ONNX and PyTorch on Tesla platforms. On Jetson, it also supports TensorRT and TensorFlow (GraphDef / SavedModel). TensorFlow and ONNX can be configured with TensorRT acceleration.