Hello everyone,
I’ve been working on converting a trained YOLOv5 model to TensorRT on my NVIDIA Jetson Orin Nano Developer Kit, and I’m facing a persistent issue with CUDA device availability. I’d like to share what I’ve done so far, the exact errors I’ve encountered, and hopefully find some guidance.
Background
I’m using a trained YOLOv5 model (best.pt
), which I want to convert to ONNX and then to TensorRT format for optimized inference on my Jetson Orin Nano. The aim is to eventually deploy this optimized model in a real-time drowsiness detection system using a CSI camera.
Steps Taken So Far
Setting Up CUDA and PyTorch:
- CUDA Installation:
I initially installed CUDA 12.2 and confirmed the installation usingnvcc --version
. Everything seemed fine, and I also checked device compatibility using thedeviceQuery
sample, which gave a “Result = PASS”. - OpenCV Compilation with CUDA:
I compiled OpenCV with CUDA and GStreamer support to ensure GPU acceleration was available, using thecmake
command and making sure the configuration hadCUDA: YES
andGStreamer: YES
.
PyTorch Installation Issues:
I tried installing PyTorch with GPU support using different versions (like torch==2.0.1+cu118
) and different index URLs (e.g., https://download.pytorch.org/whl/cu118
). However, I kept running into an issue where only the CPU version seemed to install. Despite several attempts to install different versions of torch
with CUDA support, torch.cuda.is_available()
returned False
every time.
Exporting YOLOv5 to ONNX:
I tried exporting my trained model (best.pt
) to ONNX using the following command:
python export.py --weights /home/onur/Desktop/projects/denemeV2/yolov5/runs/train/exp4/weights/best.pt --img-size 640 --batch-size 1 --device 0 --include onnx
This resulted in an error:
AssertionError: Invalid CUDA '--device 0' requested, use '--device cpu' or pass valid CUDA device(s)
The error suggests that no valid CUDA device is available, even though deviceQuery
showed that CUDA is installed and the GPU is working.
Other information:
import torch
print("CUDA available?", torch.cuda.is_available())
print("CUDA device count:", torch.cuda.device_count())
for i in range(torch.cuda.device_count()):
print(f"Device {i}: {torch.cuda.get_device_name(i)}")
Result:
CUDA available? False
CUDA device count: 0
My Setup
- Device: NVIDIA Jetson Orin Nano Developer Kit
- CUDA Version: 12.2
- JetPack Version: 6.0
- PyTorch Version: Tried multiple versions, including
2.0.1+cu118
and2.5.1
(all ended up as CPU-only) - YOLOv5 Version: Latest from the Ultralytics repository
- Python: 3.10
My Questions
- Why is
torch.cuda.is_available()
returningFalse
?
Given that the CUDA installation seems valid (deviceQuery
passed), why can’t PyTorch detect the GPU? - Compatibility Issue?
Is there a compatibility issue with the Jetson Orin Nano that I’m missing? - Correct TensorRT Workflow:
For converting YOLOv5 to TensorRT, is there a specific approach or toolkit version recommended for Jetson devices? - Ensuring FP32 Precision:
How can I ensure that the TensorRT.engine
file maintains FP32 precision to avoid accuracy loss?
Additional Information
- I have also tried installing TensorRT-related packages using
nvidia-pyindex
andnvidia-tensorrt
, but I faced package installation errors, likely due to compatibility issues or the package being unavailable. - The system shows that TensorRT libraries (
libnvinfer
,libnvinfer-dev
, etc.) are installed, which indicates that the TensorRT runtime is available, but I’m struggling to integrate it properly into my PyTorch workflow.
Python Code I Want to Use
Below is the code that I would like to run after converting my YOLOv5 model to TensorRT. This code is for real-time video capture using my CSI camera, with inference running on the optimized model.
python
import cv2
import numpy as np
import time
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
# TensorRT model dosyasının yolu
engine_path = '/home/onur/Desktop/projects/denemeV2/yolov5/best.engine'
# TensorRT Logger ve Context oluşturma
logger = trt.Logger(trt.Logger.INFO)
runtime = trt.Runtime(logger)
# Engine dosyasını yükleme
with open(engine_path, 'rb') as f:
engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()
# GStreamer pipeline tanımlama
def gstreamer_pipeline(
sensor_id=0,
capture_width=1280,
capture_height=720,
display_width=960,
display_height=540,
framerate=30,
flip_method=6,
):
return (
"nvarguscamerasrc sensor-id=%d ! "
"video/x-raw(memory:NVMM), width=(int)%d, height=(int)%d, framerate=(fraction)%d/1 ! "
"nvvidconv flip-method=%d ! "
"video/x-raw, width=(int)%d, height=(int)%d, format=(string)BGRx ! "
"videoconvert ! "
"video/x-raw, format=(string)BGR ! appsink"
% (
sensor_id,
capture_width,
capture_height,
framerate,
flip_method,
display_width,
display_height,
)
)
# Canlı video yakalama
cap = cv2.VideoCapture(gstreamer_pipeline(flip_method=6), cv2.CAP_GSTREAMER)
prev_frame_time = 0
new_frame_time = 0
prev_infer_time = time.time()
infer_interval = 0.5 # Modelin yarı saniye aralıkla çalışması için
latest_frame = None # Son tahmin edilen kare
latest_results = None # Son tahmin sonucu
if not cap.isOpened():
print("Error: Unable to open camera")
exit()
while cap.isOpened():
ret, frame = cap.read()
if not ret:
print("Error: Unable to read frame from camera.")
break
current_time = time.time()
# Model tahminini belirli bir aralıkla yap
if current_time - prev_infer_time > infer_interval:
# Yeni tahmin yapılacaksa güncelleme yap
latest_frame = frame.copy()
# TensorRT ile tahmin yapmak için gerekli kod (örnek olarak gösteriliyor)
# input ve output için gerekli belleği ayarlamak gereklidir
# Burada tahmin kısmı eklenmelidir.
prev_infer_time = current_time
# Eğer daha önceden tahmin yapıldıysa bu tahmini kullanarak sonucu göster
if latest_results is not None:
annotated_frame = latest_frame # TensorRT sonuçlarını kullanarak frame üzerinde değişiklik yapın
else:
annotated_frame = frame
# FPS hesaplama
new_frame_time = time.time()
fps = 1 / (new_frame_time - prev_frame_time)
prev_frame_time = new_frame_time
# Virgüllü formatta FPS değeri
fps_text = "FPS: {:.2f}".format(fps)
# FPS değerini ekrana yazdırma
cv2.putText(annotated_frame, fps_text, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)
# Ekranda görüntüyü göster
cv2.imshow('WTF', annotated_frame)
if cv2.waitKey(10) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Questions Regarding the Code:
- Does this code need any changes to be compatible with my device (Jetson Orin Nano)?
- Are there any additional libraries or modifications needed to make this TensorRT code functional?
- Is the TensorRT inference logic correctly integrated, or are there any specific adjustments recommended?
Any help or suggestions on resolving these issues or improving my workflow would be greatly appreciated. Thank you in advance! AakankshaS, @EduardoSalazar96, @allan.navarro, @proventusnova