I am using a Jetson AGX Xavier with 16GB RAM memory for object detection and instance segmentation inferencing. However the algorithms are running very slowly compared to standard desktop computers with 8GB GPU.
I have run jtop to check if the GPU is properly utilized. Every time the Program makes a prediction, the GPU usage goes up to 100%. However, under the memory registry of jtop, I notice that the GPU always only utilizes about 2GB of RAM memory when performing inference.
Should I consider it normal that the GPU “only” uses 2GB of RAM or can I somehow make it use more at the time?
To clarify, I am running the inferencing with a batch size of 1 because I want to use it for real-time application on an image stream.
I am using a swin-transformer model for object-detection and instance segmentation. I have taken it from MMDetection which is pytorch based. Running it on the Jetson on using the API of MMDetection(pytorch), I am getting an inference time of 1s per image (1 fps). Then I converted it to ONNX and ran it in onnxruntime-gpu. Which led to a speedup where it currently runs at 0.5s per image (2 fps). But it’s still very slow for real-time applications. I am trying to convert the model further to TensorRT but it’s proving to be quite a challenge and I am unsure if it will help that much.
For that reason, I was asking myself if the problem might be somehow hardware/gpu related.
Any help or advice would be much appreciated!