YOLOv8 model training on Jetson Orin Nano

I want to train a YOLOv8n object detection model using a custom dataset with around 30,000 images. I ran the following script to begin training:

from ultralytics import YOLO

model = YOLO(‘yolov8n.pt’)

model.train(
data=‘path/to/data.yaml’, # Path to the data config file
epochs=100, # Number of epochs
imgsz=640, # Image size
batch=2, # Batch size
save = True, #saves training checkpoints - useful for resuming training
workers=4, # Number of workers for data loading
device=0, # Use GPU for training, use 1 to force CPU usage
project=‘runs/train’, # Save results to ‘runs/train’
name=‘exp’, # Name of the experiment
exist_ok=True # Overwrite existing results
)

However it is currently estimating around 50-55 minutes per epoch. This is too slow for me, How can I make it train faster? I believe the training should be much faster due to the Jetson Orin Nano being capable of 40 TOPS

Hi,

Have you maximized the device’s performance?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

More, could you check the GPU utilization with tegrastats and share it with us?

$ sudo tegrastats

Thanks.

@AastaLLL This is my tegrastats:

07-30-2024 00:11:34 RAM 1685/7620MB (lfb 4x4MB) SWAP 0/3810MB (cached 0MB) CPU [8%@1510,1%@1510,0%@1510,0%@1510,0%@1510,0%@1510] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[624] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@46.593C soc2@45.562C soc0@45.375C gpu@45.781C tj@46.593C soc1@45.343C VDD_IN 4468mW/4468mW VDD_CPU_GPU_CV 964mW/964mW VDD_SOC 1408mW/1408mW

Hi,

GR3D_FREQ 0%@[624]

The GPU usage is 0% so please double-check if your code applies training on GPU first.

Thanks.

@AastaLLL Hey, I didn’t realize you wanted the tegrastats while the training was happening. Here it is when the training was on-going.

07-31-2024 12:25:38 RAM 6288/7620MB (lfb 15x1MB) SWAP 139/3810MB (cached 8MB) CPU [28%@806,44%@806,35%@806,6%@806,39%@729,33%@729] EMC_FREQ 24%@2133 GR3D_FREQ 98%@[608] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@51.625C soc2@50.406C soc0@50.062C gpu@51.843C tj@51.843C soc1@50.062C VDD_IN 8888mW/8888mW VDD_CPU_GPU_CV 3129mW/3129mW VDD_SOC 2182mW/2182mW

Hi,

For the new log, it looks like GPU utilization is full (98%).
So this should already reach the limit of Orin Nano.

Thanks.

@anay.gokhale123 Maybe you can try https://colab.research.google.com/ to train the network, and use Jetson for edge inference.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.