Hello, I am using Python to develop my yolov8 application on jetson orin nano. I decode two H.264 camera stream and perform inference with .engine file. There two things we noticed:
The usage of RAM is extremely high, we can basically do nothings after we engaged the application.
The preprocessing and postprocessing are time consuming ( and I thinks the postprocessing is not consisted of tracking).
Have anyone tried yolov8 + BoT SORT with Python on any Jetson model? Please offer some advice on optimization on this.
Plus; I also try to develop a c++ application on Jetson. Jetson shares RAM with GPU, maybe I can do somethings like not using cv::cuda::GpuMat?
Ultralytics generally uses TensorRT for inference which depends on the CUDA library.
Loading CUDA takes memory (>600MB) since it needs to load all the modules to memory.
On CUDA 11.8, we introduce lazy module loading to allow users to only load the required CUDA module which can help to save the memory usage…
Please give it a try ( JetPack 6 with CUDA12).
Moreover, it looks like your preprocess and postprocess is CPU based.
If so, it’s expected to take time since memory transfer (CPU ↔ GPU) is required.