Converting YOLOv5 to TensorRT Issue

Hello everyone,

I’ve been working on converting a trained YOLOv5 model to TensorRT on my NVIDIA Jetson Orin Nano Developer Kit, and I’m facing a persistent issue with CUDA device availability. I’d like to share what I’ve done so far, the exact errors I’ve encountered, and hopefully find some guidance.

Background

I’m using a trained YOLOv5 model (best.pt), which I want to convert to ONNX and then to TensorRT format for optimized inference on my Jetson Orin Nano. The aim is to eventually deploy this optimized model in a real-time drowsiness detection system using a CSI camera.

Steps Taken So Far

Setting Up CUDA and PyTorch:

  1. CUDA Installation:
    I initially installed CUDA 12.2 and confirmed the installation using nvcc --version. Everything seemed fine, and I also checked device compatibility using the deviceQuery sample, which gave a “Result = PASS”.
  2. OpenCV Compilation with CUDA:
    I compiled OpenCV with CUDA and GStreamer support to ensure GPU acceleration was available, using the cmake command and making sure the configuration had CUDA: YES and GStreamer: YES.

PyTorch Installation Issues:

I tried installing PyTorch with GPU support using different versions (like torch==2.0.1+cu118) and different index URLs (e.g., https://download.pytorch.org/whl/cu118). However, I kept running into an issue where only the CPU version seemed to install. Despite several attempts to install different versions of torch with CUDA support, torch.cuda.is_available() returned False every time.

Exporting YOLOv5 to ONNX:

I tried exporting my trained model (best.pt) to ONNX using the following command:

python export.py --weights /home/onur/Desktop/projects/denemeV2/yolov5/runs/train/exp4/weights/best.pt --img-size 640 --batch-size 1 --device 0 --include onnx

This resulted in an error:

AssertionError: Invalid CUDA '--device 0' requested, use '--device cpu' or pass valid CUDA device(s)

The error suggests that no valid CUDA device is available, even though deviceQuery showed that CUDA is installed and the GPU is working.

Other information:

import torch
print("CUDA available?", torch.cuda.is_available())
print("CUDA device count:", torch.cuda.device_count())
for i in range(torch.cuda.device_count()):
    print(f"Device {i}: {torch.cuda.get_device_name(i)}")

Result:

CUDA available? False
CUDA device count: 0

My Setup

  • Device: NVIDIA Jetson Orin Nano Developer Kit
  • CUDA Version: 12.2
  • JetPack Version: 6.0
  • PyTorch Version: Tried multiple versions, including 2.0.1+cu118 and 2.5.1 (all ended up as CPU-only)
  • YOLOv5 Version: Latest from the Ultralytics repository
  • Python: 3.10

My Questions

  1. Why is torch.cuda.is_available() returning False?
    Given that the CUDA installation seems valid (deviceQuery passed), why can’t PyTorch detect the GPU?
  2. Compatibility Issue?
    Is there a compatibility issue with the Jetson Orin Nano that I’m missing?
  3. Correct TensorRT Workflow:
    For converting YOLOv5 to TensorRT, is there a specific approach or toolkit version recommended for Jetson devices?
  4. Ensuring FP32 Precision:
    How can I ensure that the TensorRT .engine file maintains FP32 precision to avoid accuracy loss?

Additional Information

  • I have also tried installing TensorRT-related packages using nvidia-pyindex and nvidia-tensorrt, but I faced package installation errors, likely due to compatibility issues or the package being unavailable.
  • The system shows that TensorRT libraries (libnvinfer, libnvinfer-dev, etc.) are installed, which indicates that the TensorRT runtime is available, but I’m struggling to integrate it properly into my PyTorch workflow.

Python Code I Want to Use

Below is the code that I would like to run after converting my YOLOv5 model to TensorRT. This code is for real-time video capture using my CSI camera, with inference running on the optimized model.

python

import cv2
import numpy as np
import time
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit

# TensorRT model dosyasının yolu
engine_path = '/home/onur/Desktop/projects/denemeV2/yolov5/best.engine'

# TensorRT Logger ve Context oluşturma
logger = trt.Logger(trt.Logger.INFO)
runtime = trt.Runtime(logger)

# Engine dosyasını yükleme
with open(engine_path, 'rb') as f:
    engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()

# GStreamer pipeline tanımlama
def gstreamer_pipeline(
    sensor_id=0,
    capture_width=1280,
    capture_height=720,
    display_width=960,
    display_height=540,
    framerate=30,
    flip_method=6,
):
    return (
        "nvarguscamerasrc sensor-id=%d ! "
        "video/x-raw(memory:NVMM), width=(int)%d, height=(int)%d, framerate=(fraction)%d/1 ! "
        "nvvidconv flip-method=%d ! "
        "video/x-raw, width=(int)%d, height=(int)%d, format=(string)BGRx ! "
        "videoconvert ! "
        "video/x-raw, format=(string)BGR ! appsink"
        % (
            sensor_id,
            capture_width,
            capture_height,
            framerate,
            flip_method,
            display_width,
            display_height,
        )
    )

# Canlı video yakalama
cap = cv2.VideoCapture(gstreamer_pipeline(flip_method=6), cv2.CAP_GSTREAMER)
prev_frame_time = 0
new_frame_time = 0
prev_infer_time = time.time()
infer_interval = 0.5  # Modelin yarı saniye aralıkla çalışması için
latest_frame = None  # Son tahmin edilen kare
latest_results = None  # Son tahmin sonucu

if not cap.isOpened():
    print("Error: Unable to open camera")
    exit()

while cap.isOpened():
    ret, frame = cap.read()
    
    if not ret:
        print("Error: Unable to read frame from camera.")
        break

    current_time = time.time()

    # Model tahminini belirli bir aralıkla yap
    if current_time - prev_infer_time > infer_interval:
        # Yeni tahmin yapılacaksa güncelleme yap
        latest_frame = frame.copy()

        # TensorRT ile tahmin yapmak için gerekli kod (örnek olarak gösteriliyor)
        # input ve output için gerekli belleği ayarlamak gereklidir
        # Burada tahmin kısmı eklenmelidir.

        prev_infer_time = current_time

    # Eğer daha önceden tahmin yapıldıysa bu tahmini kullanarak sonucu göster
    if latest_results is not None:
        annotated_frame = latest_frame  # TensorRT sonuçlarını kullanarak frame üzerinde değişiklik yapın
    else:
        annotated_frame = frame

    # FPS hesaplama
    new_frame_time = time.time()
    fps = 1 / (new_frame_time - prev_frame_time)
    prev_frame_time = new_frame_time

    # Virgüllü formatta FPS değeri
    fps_text = "FPS: {:.2f}".format(fps)

    # FPS değerini ekrana yazdırma
    cv2.putText(annotated_frame, fps_text, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)

    # Ekranda görüntüyü göster
    cv2.imshow('WTF', annotated_frame)

    if cv2.waitKey(10) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Questions Regarding the Code:

  1. Does this code need any changes to be compatible with my device (Jetson Orin Nano)?
  2. Are there any additional libraries or modifications needed to make this TensorRT code functional?
  3. Is the TensorRT inference logic correctly integrated, or are there any specific adjustments recommended?

Any help or suggestions on resolving these issues or improving my workflow would be greatly appreciated. Thank you in advance! AakankshaS, @EduardoSalazar96, @allan.navarro, @proventusnova

Hello Onur,

I’m experiencing the same issue with the Jetson Orin Nano and JetPack 6.1. It appears that there isn’t a current version of PyTorch compatible with CUDA on JetPack 6.1. After multiple attempts, I’ve concluded that, for now, it’s not feasible.

Alternatives I recommend:

  1. Switch to JetPack 5.x or 4.6: These versions offer better support for PyTorch with CUDA.
  2. Use ONNX and TensorRT: Export your model to ONNX and convert it to TensorRT to leverage GPU capabilities.

I believe these options are more effective than continuing without results. I hope this helps, and best of luck!


Question for the Forum: Is there a planned update to JetPack that will resolve this compatibility issue with PyTorch and CUDA? If so, is it included in the roadmap?

Hi,

For JetPack 6.0, could you try our PyTorch using the link below?
It has enabled the CUDA support.

Thanks.

Hello @AastaLLL , @dusty_nv)

Following a recommendation, I attempted to install the specific PyTorch wheel provided for JetPack 6.0 with CUDA support on my NVIDIA Jetson Orin Nano Developer Kit. Unfortunately, I’ve run into multiple issues, and I’d like to share what happened after trying to follow those instructions.

Steps Taken After Receiving the Response

  1. Received Suggested PyTorch Wheels:
  • I received links to pre-built PyTorch, torchvision, and torchaudio wheels intended for JetPack 6.0 with CUDA support. These were supposed to resolve my CUDA detection issue in PyTorch.
  • I downloaded the following wheels:
    • torch-2.3.0-cp310-cp310-linux_aarch64.whl
    • torchvision-0.18.0a0+6043bc2-cp310-cp310-linux_aarch64.whl
  1. Installation Attempts:
  • I first installed torch-2.3.0 using:
pip3 install numpy torch-2.3.0-cp310-cp310-linux_aarch64.whl
* This installation removed the previously installed **torch-2.5.1** and successfully installed **torch-2.3.0**.
* During the installation, I encountered a warning about package conflicts: `torchvision` required **torch==2.5.1**, while I had **torch==2.3.0** installed.
  • I then proceeded to install torchvision using:
pip3 install --user ./torchvision-0.18.0a0+6043bc2-cp310-cp310-linux_aarch64.whl
* This command initially failed with a warning indicating the file could not be found. After verifying the filename and location, I used the correct path to install the wheel.
  1. Testing CUDA Availability:
  • After installing both wheels, I tested if CUDA was available in PyTorch:
python3 -c "import torch; print(torch.__version__); print('CUDA available:', torch.cuda.is_available())"
* Unfortunately, this resulted in an error regarding a missing shared object file:
OSError: libmpi_cxx.so.20: cannot open shared object file: No such file or directory
  1. Troubleshooting Missing Libraries:
  • I checked for the presence of libmpi_cxx.so using:
find /usr -name "libmpi_cxx.so*"
* I found **libmpi_cxx.so.40**, but not **libmpi_cxx.so.20**, which was what PyTorch seemed to be looking for. To address this, I created a symbolic link:
sudo ln -s /usr/lib/aarch64-linux-gnu/libmpi_cxx.so.40 /usr/lib/aarch64-linux-gnu/libmpi_cxx.so.20
  • After creating the symbolic link, I tried running the CUDA check again, but this time I encountered a different error:
OSError: libcufft.so.10: cannot open shared object file: No such file or directory
  • I repeated the process for libcufft, where I only had libcufft.so.11 available. I created a symbolic link for libcufft.so.10 as well:
sudo ln -s /usr/local/cuda-12.2/targets/aarch64-linux/lib/libcufft.so.11 /usr/local/cuda-12.2/targets/aarch64-linux/lib/libcufft.so.10
  • The next missing library was libcublas.so.10, and again I only had libcublas.so.12. So I created another symbolic link:
sudo ln -s /usr/local/cuda-12.2/targets/aarch64-linux/lib/libcublas.so.12 /usr/local/cuda-12.2/targets/aarch64-linux/lib/libcublas.so.10
  1. Current Issue:
  • Despite creating symbolic links for all these libraries, I am still facing errors when checking for CUDA availability in PyTorch. The errors just keep shifting to different missing dependencies.
  • The current error is:
OSError: libcublas.so.10: cannot open shared object file: No such file or directory

Questions and Request for Help

  1. Library Version Compatibility:
  • I have CUDA 12.2 installed, and I noticed that many of the missing libraries are older versions (e.g., libcufft.so.10 instead of libcufft.so.11). Should I be downgrading CUDA or is there another way to make PyTorch compatible with the newer versions?
  1. Symbolic Links as a Solution:
  • Is creating symbolic links for missing shared object files a viable solution, or am I setting myself up for further issues? Is there a better way to ensure PyTorch finds the correct versions of these libraries?
  1. Best Approach for Installing PyTorch with CUDA on JetPack 6.0:
  • Given that I am using JetPack 6.0 and CUDA 12.2, is there a particular version of PyTorch that is known to work out-of-the-box? I downloaded the recommended wheels, but they still seem to have issues with shared library compatibility.

Any help or suggestions on how to correctly set up PyTorch with CUDA support on the Jetson Orin Nano would be greatly appreciated. If anyone has experience with the exact combination of JetPack 6.0, CUDA 12.2, and PyTorch, your insights would be invaluable.

Hi @AastaLLL , @dusty_nv,
After following the advice given to me and attempting to install the correct PyTorch wheel for JetPack 6.0, I still encountered issues related to CUDA detection and PyTorch extensions. Here is an update on what I did and the current error I am facing:

Steps Taken After the Suggestion:

  1. Downloaded Suggested PyTorch Wheel: I downloaded the wheel for torch-2.3.0-cp310-cp310-linux_aarch64.whl as suggested in the previous forum response.
  2. Installed PyTorch and Dependencies: I used the following commands to install PyTorch:
pip install --force-reinstall ~/Downloads/torch-2.3.0-cp310-cp310-linux_aarch64.whl

The installation seemed to be successful, although I noticed that there were some dependency conflicts regarding ultralytics and torchvision.
3. Checked CUDA Availability: After installation, I ran:

python3 -c "import torch; print(torch.__version__); print('CUDA available:', torch.cuda.is_available())"

The output showed:

2.5.1
CUDA available: False

All output now:

onur@ubuntu:~$ python3 -c "import torch; print(torch.__version__); print('CUDA available:', torch.cuda.is_available())"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/onur/.local/lib/python3.10/site-packages/torch/__init__.py", line 213, in <module>
    raise ImportError(textwrap.dedent('''
ImportError: Failed to load PyTorch C extensions:
    It appears that PyTorch has loaded the `torch/_C` folder
    of the PyTorch repository rather than the C extensions which
    are expected in the `torch._C` namespace. This can occur when
    using the `install` workflow. e.g.
        $ python setup.py install && python -c "import torch"

    This error can generally be solved using the `develop` workflow
        $ python setup.py develop && python -c "import torch"  # This should succeed
    or by running Python from a different directory.

I tried again after rebooting the system and checking the environment, but torch.cuda.is_available() kept returning False.
4. Current Error - PyTorch C Extensions: When running the same command, I received the following error:

ImportError: Failed to load PyTorch C extensions:
    It appears that PyTorch has loaded the `torch/_C` folder of the PyTorch repository
    rather than the C extensions which are expected in the `torch._C` namespace.

This error suggests that PyTorch is loading from the wrong namespace or perhaps there was an issue with the installation path.

My Questions:

  1. Why Does CUDA Still Appear Unavailable? Even after installing the recommended wheel, I cannot get CUDA to work with PyTorch (torch.cuda.is_available() is always False). Could this be related to the JetPack version or some other dependency?
  2. How Can I Resolve the C Extensions Issue? The torch/_C error keeps recurring despite reinstalling. Should I use a specific installation method like python setup.py develop, or is there another workaround for this on Jetson Orin Nano?
  3. Next Steps? Should I completely uninstall PyTorch and other related dependencies and start over? Or should I use a virtual environment to isolate the installation and see if that helps?

I’d appreciate any advice or guidance on how to resolve these two issues. Thank you all for your help so far!

Hi,

For OSError: libmpi_cxx.so.20: cannot open shared object file: No such file or directory error, please fix it via the below command:

sudo apt-get install python3-pip libopenblas-base libopenmpi-dev libomp-dev

In the latest comment, it looks like there are multiple PyTorch versions in your environment.
The version installed is 2.3.0 but the version from the python console is 2.5.

Could you give it a check?

Or if reflash is an option for you, you can reset the device and install our PyTorch/TorchVision to avoid the dependency issue.

Thanks.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.