Low gpu utilization during training for custom object detection (TLT / Faster_RCNN)

nrsw5c06 · April 7, 2020, 1:00pm

Hello,

Im using tlt object detection for a custom dataset and was wondering if there are any advice on how to increase the gpu utilization. I noticed that the Volatile GPU-util jumps around and that the GPU memory are not fully utilized but remains static.

I looked at Training process is slow, GPU is not fully utilized - #5 by quansm and increased the batch size, which made the Volatile GPU-util increase on average, but it still jumps all around which made me think about prefetching workers. The aforementioned issue nicely shows how to configure more workers but i dont see it in the Faster RCNN config for object detection.

help is greatly appreciated!

quansm · April 8, 2020, 12:39am

Sorry, I am not familiar with tlt object detection. In my case, it seems use tensorflow’s python script function to preprocess 3D medical image data, and it cost a lot of time. So I think this is the bottleneck. Before confirm that problem, I also use profile tool to measure the timeline to confirm the problme , and try to use tfrecord and remove python op , but not finished.

If the problem only happen in your custom data, you may check it or think about the difference between it with public data.

Or use top command to check cpu memory usage or use time function to log function execution time may be helpful.

Morganh · April 8, 2020, 9:44am

Which tlt docker did you use?

nrsw5c06 · April 8, 2020, 9:47am

I used nvcr.io/nvidia/tlt-streamanalytics:v1.0.1_py2

When i did docker run i changed runtime=nvidia with --gpus all

Morganh · April 8, 2020, 9:52am

From my experience for experiments, it is related to different training work. I can see some of jobs get high GPU utiliztion. Some are lower.

Topic		Replies	Views
GPU utilization is 0% during evaluation TAO Toolkit gpu	3	834	October 12, 2021
Tlt-infer is slow TAO Toolkit	13	844	October 12, 2021
TLT yolo_v4 slow training TAO Toolkit	11	847	October 12, 2021
Extremely slow train and evaluation of yolo_v4_tiny TAO Toolkit yolo , tao	12	1262	April 12, 2023
Looking for Insight on Disappointing Results Optimizing an Object Detection Network with TensorRT TensorRT	1	893	March 29, 2019
High ram usage with tlt unet training TAO Toolkit	24	2332	October 12, 2021
Invalid decryption. Unable to open file (file signature not found). The key used to load the model is incorrect TAO Toolkit	3	675	October 12, 2021
Tlt-infer detectnet_v2 fails - TypeError TAO Toolkit	37	1459	October 12, 2021
Why over usage of CPU and under usage of GPU - python tensorflow Frameworks tensorflow	1	663	September 3, 2019
Little to no detection using TLT Faster-RCNN trained model on Deepstream-App TAO Toolkit	13	1102	October 12, 2021

Low gpu utilization during training for custom object detection (TLT / Faster_RCNN)

Related topics