Training yolov4 with TAO toolkit occupies a lot of cpu resoures

Hi everyone. I’m training my own dataset with Yolov4 using TAO toolkit. It has a issue that, its computations mostly run on CPU. As a result, it is quite slow. Futher, I have trained with another algorithm like yolov3 and detecnet_v2 and their training mostly run on GPU.
Screenshot from 2021-09-29 08-57-47
Screenshot from 2021-09-29 08-58-23
I have some following questions:

  • Is that normal or did I have a mistake somewhere? I have trained yolov4 with TLT toolkit and I met the same problem.

  • When training stage finishes, can the model still run on GPU with deepstream?

Could you guys help me with these questions? Thanks in advance!

Could you please share your training spec file?

Yeah. Here is my spec file:yolo_v4_train_resnet18_kitti.txt (2.3 KB)

Since you are using tfrecords format, please try to disable mosaic augmentation.
See YOLOv4 - NVIDIA Docs

YOLOv4 supports two data formats: the sequence format (KITTI images folder and raw labels folder) and the tfrecords format (KITTI images folder and TFRecords). From our experience, if mosaic augmentation is disabled (mosaic_prob=0), training with TFRecords format is faster. If mosaic augmentation is enabled (mosaic_prob>0), training with sequence format is faster.

Thank you. I’ll try it.

But I still wonder that is this normal? Because I have tried another networks and they were trained quite fast.

The augmentation is different.

Yeah, I see. But another networks have GPU utilization nearly 100% while yolov4 is quite lower. Is it due to augmentation or training operations?

Also, suggest you change
force_on_cpu: true

to
force_on_cpu: false

Thank you, I’ll notice that.