Tlt-infer is slow

xwater8 · September 14, 2020, 3:43am

Hi
I try to learn how to use transfer learning toolkit, so I build the docker environment by using nvcr.io/nvidia/tlt-streamanalytics.
I try to run the example detectnet_v2.ipynb in “/workspace/examples/detectnet_v2”. When I ran step 8 Visualize inferences, I found that it is very slow and my GPU-Util is close to 0. [It needs to spend 1.5 hour to finish the inference task.]

!tlt-infer detectnet_v2 -e $SPECS_DIR/detectnet_v2_inference_kitti_tlt.txt
-o $USER_EXPERIMENT_DIR/tlt_infer_testing
-i $DATA_DOWNLOAD_DIR/testing/image_2
-k $KEY

But I run step 7 Evaluate the retrained model, everything works well (GPU-util is about 30%~40%).

!tlt-evaluate detectnet_v2 -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt
-k $KEY

Can you give me some suggestions how to improve the speed of tlt-infer ?

hardware:
RAM: 64GB
GPU: NVIDIA TITAN RTX
nvidia-driver: 450.51.05

Morganh · September 14, 2020, 8:16am

How many images in your $DATA_DOWNLOAD_DIR/testing/image_2?
More, if there are many objects in your image, the inference time is expected to long.

xwater8 · September 14, 2020, 11:29am

There are 7518 images for testing. I don’t think the inference speed is about my testing data because I see my GPU util is close to 0. My GPU util should be more.

Morganh · September 14, 2020, 3:56pm

Could you try to run default jupyter notebook to narrow down? It used KITTI dataset for training.

xwater8 · September 15, 2020, 1:32am

Well, I run the default jupyter notebook example now. The problem what I met is on above. I didn’t change the dataset(kitti dataset) and model’s spec file. I use the default setting for training , evaluating and testing.

Morganh · September 15, 2020, 2:17am

Can you share the full log when you run

# Running inference for detection on n images
!tlt-infer detectnet_v2 -e $SPECS_DIR/detectnet_v2_inference_kitti_tlt.txt
-o $USER_EXPERIMENT_DIR/tlt_infer_testing
-i $DATA_DOWNLOAD_DIR/testing/image_2
-k $KEY

xwater8 · September 15, 2020, 5:30am

Using TensorFlow backend.
2020-09-15 05:27:32.371960: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-15 05:27:35,920 [INFO] iva.detectnet_v2.scripts.inference: Overlain images will be saved in the output path.
2020-09-15 05:27:35,920 [INFO] iva.detectnet_v2.inferencer.build_inferencer: Constructing inferencer
2020-09-15 05:27:35.926105: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-09-15 05:27:35.928934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:86:00.0
2020-09-15 05:27:35.928998: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-15 05:27:35.929096: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-09-15 05:27:35.931737: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-09-15 05:27:35.931871: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-09-15 05:27:35.935717: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-09-15 05:27:35.938657: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-09-15 05:27:35.938791: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-15 05:27:35.942760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-15 05:27:35.942808: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-15 05:27:36.814358: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-15 05:27:36.814410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-09-15 05:27:36.814419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-09-15 05:27:36.818009: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22242 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:86:00.0, compute capability: 7.5)
2020-09-15 05:27:36,822 [INFO] iva.detectnet_v2.inferencer.tlt_inferencer: Loading model from /workspace/tlt-experiments/detectnet_v2/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt:
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 3, 384, 1248)      0         
_________________________________________________________________
model_1 (Model)              [(None, 3, 24, 78), (None 6041719   
=================================================================
Total params: 6,041,719
Trainable params: 6,034,487
Non-trainable params: 7,232
_________________________________________________________________
2020-09-15 05:27:39,739 [INFO] iva.detectnet_v2.scripts.inference: Initialized model
2020-09-15 05:27:39,766 [INFO] iva.detectnet_v2.scripts.inference: Commencing inference
  0%|                                                   | 0/470 [00:00<?, ?it/s]2020-09-15 05:27:40.965787: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-15 05:27:42.547878: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
  1%|▎                                        | 4/470 [00:44<1:26:07, 11.09s/it]

Morganh · September 15, 2020, 5:48am

The speed looks normal.

1%|▎ | 4/470 [00:44<1:26:07, 11.09s/it]

xwater8 · September 15, 2020, 5:58am

One batch has 16 images:
7518 / 470 = 15.995 → 16

16imgs / 11.09s = 1.44 FPS

Is it really slow?? Why does it run slow? Drawing bbox on images or doing something? Why my GPU util is low??

Morganh · September 15, 2020, 6:56am

The tlt-infer tool produces two outputs.

Overlain images in $USER_EXPERIMENT_DIR/tlt_infer_testing/images_annotated

Frame by frame bbox labels in kitti format located in $USER_EXPERIMENT_DIR/tlt_infer_testing/labels

a428tm · November 11, 2020, 5:31am

@Morganh and @xwater8

Curious what made you accept the last comment as the solution.
I am actually seeing the exact same speed when I am following the same tutorial.
Speed of FP16 model is -

29%|███████████▊ | 135/470 [22:53<56:47, 10.17s/it]

And based on the output of

nvidia-smi

This is what I am seeing (please see attached image) -

Did you just accept the fact that its inference is slow?
Would appreciate your feedback.

Thanks!

Morganh · November 11, 2020, 6:03am

See Probleme with training/pruning tlt - #8 by R.c

The tlt-infer will write bbox to images and also write label files.
But it does not mean the inference time.

For how to check the inference time, you can run trtexec.
Reference: Measurement model speed

a428tm · November 11, 2020, 6:08am

@Morganh
Saw that exact link just now. Will check that when I run it on TX2.

Appreciate it

Topic		Replies	Views
Tlt-infer detectnet_v2 fails - TypeError TAO Toolkit	37	1402	October 12, 2021
How to do inference with a TLT faster rcnn model? TAO Toolkit	15	1696	October 12, 2021
Inference fails on etlt model file TAO Toolkit	4	483	April 13, 2023
Slow inference UNet Industrial TF-TRT TensorRT tensorrt , tensorflow	1	458	July 2, 2023
Inference is so slow with torch1.6 Jetson Xavier NX nvbugs , pytorch	12	3534	October 23, 2020
Why my inference time is so long when using trtexec - FP16? Jetson TX2 jetson-inference	4	1945	October 18, 2021
Detectnet_v2 trained, tao infer can infer, but no results TAO Toolkit jetson-inference	7	546	October 23, 2023
Slow first inference and very slow two models inference TensorRT	3	1236	August 2, 2022
Trouble using trt-infer on peoplenet pretrained model TAO Toolkit	11	791	October 12, 2021
No performance improvement with TF-TRT optimization (ResNet50, DenseNet121) TensorRT	4	1090	June 15, 2020

Tlt-infer is slow

Related topics