GPU Performance Worse than CPU Performance on YOLO inferences

I am currently using the YoloDotNet NuGet package to test the performance of YOLO models, I’m doing this testing for my degree thesis. However, I have encountered an issue where the GPU performance is significantly worse than the CPU performance.

Environment:

YoloDotNet version: v2.0
CPU: AMD ryzen 7 7800X3D
GPU: 4070 super
CUDA/cuDNN version: cuda 11.8 and cudnn 8.9.7
.NET version: 8

Steps to Reproduce:

var sw = new Stopwatch();
for (var i = 0; i < 500; i++)
{
    var file = $@"C:\Users\Utente\Documents\assets\images\input\frame_{i}.jpg";

    using var image = SKImage.FromEncodedData(file);
    sw.Restart();
    var results = yolo.RunObjectDetection(image, confidence: 0.25, iou: 0.7);
    sw.Stop();
    image.Draw(results);

    image.Save(file.Replace("input", $"output_{yolo_version}{version}_{target}").Replace(".jpg", $"_detect_{yolo_version}{version}_{target}.jpg"),
        SKEncodedImageFormat.Jpeg);
    times.Add(sw.Elapsed.TotalMilliseconds);
    Console.WriteLine($"Time taken for image {i}: {sw.Elapsed.TotalMilliseconds:F2} ms");

This is the way I’m taking the time measure for the detections.

Expected Behavior is that the inference using the GPU should be faster than inference using the CPU.
But the performance are not improving using the GPU.

To load the model i use this setup in the GPU case

yolo = new Yolo(new YoloOptions
{
    OnnxModel = @$"C:\Users\Utente\Documents\assets\model\yolov{yolo_version}{version}_{target}.onnx",
    ModelType = ModelType.ObjectDetection,  // Model type
    Cuda = true,                           // Use CPU or CUDA for GPU accelerated inference. Default = true
    GpuId = 0,                               // Select Gpu by id. Default = 0
    PrimeGpu = true,                       // Pre-allocate GPU before first. Default = false
});
Console.WriteLine(yolo.OnnxModel.ModelType);
Console.WriteLine($"Using GPU for version {yolo_version}{version}");

Performance Metrics using yolov8:

CPU Inference Time:
Total time taken for version m: 25693 ms

Average time per image for version m: 51.25 ms

GPU Inference Time:
Total time taken for version m: 34459.73 ms

Average time per image for version m: 69.74 ms


The issue presents its self for different sizes of the model. I have printed only the size m for ease of visualization.

I would appreciate any assistance or guidance in resolving this issue. Please let me know if you need any further information.

Thank you.