Description
Hello NVIDIA Community,
I am currently working on a project using the YOLOv5 model for object detection. After training my model using the Ultralytics YOLO library, I successfully converted my model (best.pt) to the ONNX format (best.onnx) using the following script:
from ultralytics import YOLO
Load the YOLO model
model = YOLO(“best.pt”) # Load your trained model
Export the model to ONNX format
model.export(format=“onnx”) # Creates ‘best.onnx’
Load the exported ONNX model
onnx_model = YOLO(“best.onnx”)
After the conversion, I used the ONNX Runtime for inference. Here is my C++ code snippet from the ultralytics:
void Detector(YOLO_V8*& p) {
std::filesystem::path imgs_path = "/home/ubuntu/margsofr/yolov5-opencv-cpp-python/images";
for (auto& i : std::filesystem::directory_iterator(imgs_path)) {
std::string img_path = i.path().string();
cv::Mat img = cv::imread(img_path);
std::vector<DL_RESULT> res;
p->RunSession(img, res);
std::vector<cv::Rect> boxes;
std::vector<float> confidences;
for (const auto& re : res) {
boxes.push_back(re.box);
confidences.push_back(re.confidence);
}
std::vector<int> indices;
cv::dnn::NMSBoxes(boxes, confidences, 0.5, 0.4, indices);
for (int idx : indices) {
const auto& re = res[idx];
cv::rectangle(img, re.box, cv::Scalar(0, 255, 0), 3);
}
cv::imwrite("output.jpg", img);
}
Environment
ONNX Version: 1.20.0
CMake Version: 3.26
OpenCV Version: 4.10.0
Operating System: Ubuntu 22.04
Output:
this is the input and output dimensions along with first 3 detections;
./Yolov8OnnxRuntimeCPPInference
[YOLO_V8(CUDA)]: Cuda warm-up cost 2322.47 ms.
Input node dimensions: 1 3 640 640
Inference run completed
Output node dimensions: 1 5 8400
Shape: 5 8400
model type: 1
Using FP32 for rawData matrix
rawData shape after transpose: 8400 x 5
NMS results count: 1924
[YOLO_V8(CUDA)]: 18.114ms pre-process, 2355.5ms inference, 94.241ms post-process.
Box: x=1056, y=239, width=1277, height=1347, confidence=658.98
Box: x=920, y=1551, width=1297, height=811, confidence=654.36
Box: x=-107, y=1561, width=1053, height=763, confidence=653.25
Three detected bboxes: