how to accelerate model reference on jetson xavier?

I am wondering how to accelerate the computation on Jetson Xavier.
I am running a trained model on the Jetson Xavier but I’d like to do some acceleration based on the hardware.For example, how to parrallel computation, how to manage the memory, how to deal with the IO.
Or cuda and cudnn have done all the work for us?Is it possible to do some work on these aspect mentioned.


It’s recommended to check our Deepstream SDK first:

The SDK uses the open source GStreamer to deliver high throughput with a low-latency streaming framework. The runtime system is pipelined to enable deep learning capabilities, as well as image and sensor processing and fusion algorithms in a streaming application.