Im using tlt object detection for a custom dataset and was wondering if there are any advice on how to increase the gpu utilization. I noticed that the Volatile GPU-util jumps around and that the GPU memory are not fully utilized but remains static.
I looked at Training process is slow, GPU is not fully utilized - #5 by quansm and increased the batch size, which made the Volatile GPU-util increase on average, but it still jumps all around which made me think about prefetching workers. The aforementioned issue nicely shows how to configure more workers but i dont see it in the Faster RCNN config for object detection.
Sorry, I am not familiar with tlt object detection. In my case, it seems use tensorflow’s python script function to preprocess 3D medical image data, and it cost a lot of time. So I think this is the bottleneck. Before confirm that problem, I also use profile tool to measure the timeline to confirm the problme , and try to use tfrecord and remove python op , but not finished.
If the problem only happen in your custom data, you may check it or think about the difference between it with public data.
Or use top command to check cpu memory usage or use time function to log function execution time may be helpful.