Hi, I am trying to run the YOLOv8 model for detection of bed on the Jetson Orin NX. But Orin only using CPU alone and GPU usage is 0.0%. In my logic added torch library to select the GPU device. My model is also consuming 100% of six thread on it.
For ultralytics Installation, YOLOv8 Installation procedure is followed and verified the Installed package.
After converting the PyToch (.pt) model to TensorRT (.engine) model.
Then, I am able to use the model in the Jetson orin NX 8GB with GPU but it has consuming lot amount of RAM. i.e. For the Starting and Ideal state of Jetson orin took ~ 2.0 GB, but after running the detection model uses 2 GB of RAM by total 4GB of RAM is consumed.
From CUDA 11.8, we introduce a new feature called lazy loading that can reduce memory usage.
With lazy loading, a CUDA app (ex. TensorRT) doesn’t need to load the whole CUDA library at the beginning.
Just load it once needed.
Because, I am not building the model. I have converted the PyTorch model to the Onnx and TensorRt model. Can i get any possibility to remove the cuBLAS and cuDNN while conversion.