Hello. I’m running a custom-trained YOLOv9 model on a Jetson Orin Nano (4GB) for drone detection. The model was trained on a UAV dataset from Kaggle, and while inference works fine on CPU, it fails when running with CUDA (GPU).
The error:
RuntimeError: FIND was unable to find an engine to execute this computation after trying 0 plans.
terminate called without an active exception
Aborted (core dumped)
*** Please note that this reply is generated by LLM automatically ***
Based on the provided context, I’ll try to help you with your issue.
It seems like you’re experiencing slow inference performance with your YOLOv9 model on a Jetson Orin Nano 4GB device, running JetPack 6.1 (b123), CUDA 12.6, cuDNN 8.9.4, and PyTorch 2.3.0. You’ve installed PyTorch from the PyTorch for Jetson repository and are using Python 3.10.12.
Here are a few potential reasons for the slow inference performance:
Model size and complexity: YOLOv9 is a complex model, and the UAV dataset from Kaggle might be large, which could contribute to slower inference times.
Hardware limitations: The Jetson Orin Nano 4GB has limited GPU resources compared to more powerful Jetson modules or desktop GPUs. This might impact the model’s performance.
PyTorch version and installation: Although you’ve installed PyTorch from the official repository, there might be version-specific issues or compatibility problems with the Jetson device.
To improve inference performance, consider the following:
Optimize the model: Try to optimize the YOLOv9 model using techniques like quantization (e.g., FP4 with QAT or PTQ), pruning, or knowledge distillation. This might reduce the model’s size and complexity, leading to faster inference times.
Use a more efficient PyTorch version: Check if there are newer PyTorch versions available for the Jetson device, which might include performance improvements.
Leverage JetPack and CUDA: Ensure that you’re using the latest JetPack version and that CUDA is properly configured. This might help optimize the GPU usage and improve performance.
Explore other inference frameworks: Consider using other inference frameworks like TensorFlow or TensorRT, which might be more optimized for the Jetson device.
Upgrade to a more powerful Jetson module: If possible, consider upgrading to a more powerful Jetson module, like the Jetson Orin NX or Jetson AGX Orin, which have more GPU resources and might provide better performance.
To further troubleshoot the issue, you can try:
Profiling the model: Use tools like PyTorch’s built-in profiler or NVIDIA’s Nsight Systems to profile the model and identify performance bottlenecks.
Checking GPU utilization: Use tools like nvidia-smi to monitor GPU utilization and ensure that the GPU is being fully utilized during inference.
Verifying CUDA and cuDNN versions: Double-check that the CUDA and cuDNN versions are compatible with the PyTorch version and the Jetson device.
If you’re still experiencing issues, please provide more details about your model, dataset, and inference code, and I’ll try to help you further.
*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***