Enable GPU and CUDA in YoloV4 code

Hello,

I’m working object detection using YOLO_V4 . I have the entire code of real time object detection with live video stream using YoloV4 and opencv version 4.5.1.
The problem is in Jetson nano, my python code is taking approximately 5 seconds to process each frame. I want the code to make it work like real-time as how it is done in Jetson-Inference.
I have enabled CUDA and GPU while installing OPENCV Version 4.5.1

Can anyone help me out with what should I add in code so that it uses GPU and CUDA to process each frame in milliseconds.

Thank you.

Hi,

Is the YOLO_V4 sample you used have CUDA support?
Please run tegrastats when inferencing at the same time.

$ sudo tegrastats

The percentage of GPU utilization can give you an idea about if this sample can be optimized further or not.
Thanks.

Hi,

Please find below the tegrastats ran when Yolo was running,

Yolo_V4

RAM 2749/3956MB (lfb 10x1MB) SWAP 51/6074MB (cached 2MB) IRAM 0/252kB(lfb 252kB) CPU [94%@1479,100%@1479,94%@1479,92%@1479] EMC_FREQ 8%@1600 GR3D_FREQ 0%@76 VIC_FREQ 0%@192 APE 25 PLL@48C CPU@51.5C PMIC@50C GPU@46C AO@55.5C thermal@49C POM_5V_IN 6775/6304 POM_5V_GPU 85/139 POM_5V_CPU 3939/3471

RAM 2749/3956MB (lfb 10x1MB) SWAP 51/6074MB (cached 2MB) IRAM 0/252kB(lfb 252kB) CPU [40%@921,63%@921,38%@921,41%@921] EMC_FREQ 8%@1600 GR3D_FREQ 1%@153 VIC_FREQ 0%@192 APE 25 PLL@45.5C CPU@46.5C PMIC@50C GPU@45C AO@54C thermal@45.75C POM_5V_IN 3433/5985 POM_5V_GPU 173/142 POM_5V_CPU 651/3157

RAM 2749/3956MB (lfb 10x1MB) SWAP 51/6074MB (cached 2MB) IRAM 0/252kB(lfb 252kB) CPU [54%@1479,53%@1479,56%@1479,56%@1479] EMC_FREQ 8%@1600 GR3D_FREQ 0%@153 VIC_FREQ 0%@192 APE 25 PLL@47.5C CPU@51C PMIC@50C GPU@46C AO@55C thermal@48C POM_5V_IN 6646/6051 POM_5V_GPU 85/137 POM_5V_CPU 3853/3227

RAM 2749/3956MB (lfb 10x1MB) SWAP 51/6074MB (cached 2MB) IRAM 0/252kB(lfb 252kB) CPU [96%@1479,95%@1479,96%@1479,99%@1479] EMC_FREQ 7%@1600 GR3D_FREQ 0%@76 VIC_FREQ 0%@192 APE 25 PLL@48C CPU@51C PMIC@50C GPU@46.5C AO@55.5C thermal@49C POM_5V_IN 6732/6113 POM_5V_GPU 85/132 POM_5V_CPU 4067/3303

And also for reference, I have added the tegrastats for Jeston inference also.

Jetson Inference

RAM 3118/3956MB (lfb 2x1MB) SWAP 113/6074MB (cached 2MB) IRAM 0/252kB(lfb 252kB) CPU [25%@921,26%@921,21%@921,17%@921] EMC_FREQ 19%@1600 GR3D_FREQ 99%@921 NVENC 716 VIC_FREQ 0%@192 APE 25 PLL@48C CPU@49.5C PMIC@50C GPU@48.5C AO@59C thermal@49C POM_5V_IN 6517/5857 POM_5V_GPU 2358/1667 POM_5V_CPU 771/786

RAM 3118/3956MB (lfb 2x1MB) SWAP 113/6074MB (cached 2MB) IRAM 0/252kB(lfb 252kB) CPU [20%@921,25%@921,20%@921,19%@921] EMC_FREQ 19%@1600 GR3D_FREQ 98%@921 NVENC 716 VIC_FREQ 0%@192 APE 25 PLL@47.5C CPU@49.5C PMIC@50C GPU@49C AO@59C thermal@48.75C POM_5V_IN 6484/5887 POM_5V_GPU 2401/1702 POM_5V_CPU 686/781

RAM 3117/3956MB (lfb 2x1MB) SWAP 113/6074MB (cached 2MB) IRAM 0/252kB(lfb 252kB) CPU [18%@921,23%@921,19%@921,16%@921] EMC_FREQ 19%@1600 GR3D_FREQ 93%@921 NVENC 716 VIC_FREQ 0%@192 APE 25 PLL@48.5C CPU@49.5C PMIC@50C GPU@49C AO@58.5C thermal@48.75C POM_5V_IN 6398/5910 POM_5V_GPU 2315/1730 POM_5V_CPU 687/777

RAM 3117/3956MB (lfb 1x2MB) SWAP 113/6074MB (cached 2MB) IRAM 0/252kB(lfb 252kB) CPU [18%@921,28%@921,17%@921,18%@921] EMC_FREQ 19%@1600 GR3D_FREQ 92%@921 NVENC 716 VIC_FREQ 0%@192 APE 25 PLL@48C CPU@49C PMIC@50C GPU@49C AO@58C thermal@48.75C POM_5V_IN 6183/5922 POM_5V_GPU 2190/1750 POM_5V_CPU 601/769

Thank you.

Hi,

Yolo_V4 looks like a CPU-based implementation.
Please check with the author to see if they support GPU-based inference.

... CPU [96%@1479,95%@1479,96%@1479,99%@1479] ... 0%@76

More, it looks like you don’t maximize the device performance yet.
You can do this with the following command:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.