xwater8
September 14, 2020, 3:43am
1
Hi
I try to learn how to use transfer learning toolkit, so I build the docker environment by using nvcr.io/nvidia/tlt-streamanalytics .
I try to run the example detectnet_v2.ipynb in “/workspace/examples/detectnet_v2”. When I ran step 8 Visualize inferences, I found that it is very slow and my GPU-Util is close to 0. [It needs to spend 1.5 hour to finish the inference task.]
!tlt-infer detectnet_v2 -e $SPECS_DIR/detectnet_v2_inference_kitti_tlt.txt
-o $USER_EXPERIMENT_DIR/tlt_infer_testing
-i $DATA_DOWNLOAD_DIR/testing/image_2
-k $KEY
But I run step 7 Evaluate the retrained model, everything works well (GPU-util is about 30%~40%).
!tlt-evaluate detectnet_v2 -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt
-k $KEY
Can you give me some suggestions how to improve the speed of tlt-infer ?
hardware:
RAM: 64GB
GPU: NVIDIA TITAN RTX
nvidia-driver: 450.51.05
1 Like
Morganh
September 14, 2020, 8:16am
3
How many images in your $DATA_DOWNLOAD_DIR/testing/image_2?
More, if there are many objects in your image, the inference time is expected to long.
xwater8
September 14, 2020, 11:29am
4
There are 7518 images for testing. I don’t think the inference speed is about my testing data because I see my GPU util is close to 0. My GPU util should be more.
Morganh
September 14, 2020, 3:56pm
5
Could you try to run default jupyter notebook to narrow down? It used KITTI dataset for training.
xwater8
September 15, 2020, 1:32am
6
Well, I run the default jupyter notebook example now. The problem what I met is on above. I didn’t change the dataset(kitti dataset) and model’s spec file. I use the default setting for training , evaluating and testing.
Morganh
September 15, 2020, 2:17am
7
Can you share the full log when you run
# Running inference for detection on n images
!tlt-infer detectnet_v2 -e $SPECS_DIR/detectnet_v2_inference_kitti_tlt.txt
-o $USER_EXPERIMENT_DIR/tlt_infer_testing
-i $DATA_DOWNLOAD_DIR/testing/image_2
-k $KEY
xwater8
September 15, 2020, 5:30am
8
Using TensorFlow backend.
2020-09-15 05:27:32.371960: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-15 05:27:35,920 [INFO] iva.detectnet_v2.scripts.inference: Overlain images will be saved in the output path.
2020-09-15 05:27:35,920 [INFO] iva.detectnet_v2.inferencer.build_inferencer: Constructing inferencer
2020-09-15 05:27:35.926105: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-09-15 05:27:35.928934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:86:00.0
2020-09-15 05:27:35.928998: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-15 05:27:35.929096: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-09-15 05:27:35.931737: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-09-15 05:27:35.931871: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-09-15 05:27:35.935717: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-09-15 05:27:35.938657: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-09-15 05:27:35.938791: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-15 05:27:35.942760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-15 05:27:35.942808: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-15 05:27:36.814358: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-15 05:27:36.814410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-09-15 05:27:36.814419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-09-15 05:27:36.818009: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22242 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:86:00.0, compute capability: 7.5)
2020-09-15 05:27:36,822 [INFO] iva.detectnet_v2.inferencer.tlt_inferencer: Loading model from /workspace/tlt-experiments/detectnet_v2/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 3, 384, 1248) 0
_________________________________________________________________
model_1 (Model) [(None, 3, 24, 78), (None 6041719
=================================================================
Total params: 6,041,719
Trainable params: 6,034,487
Non-trainable params: 7,232
_________________________________________________________________
2020-09-15 05:27:39,739 [INFO] iva.detectnet_v2.scripts.inference: Initialized model
2020-09-15 05:27:39,766 [INFO] iva.detectnet_v2.scripts.inference: Commencing inference
0%| | 0/470 [00:00<?, ?it/s]2020-09-15 05:27:40.965787: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-15 05:27:42.547878: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
1%|▎ | 4/470 [00:44<1:26:07, 11.09s/it]
xwater8
September 15, 2020, 5:58am
10
One batch has 16 images:
7518 / 470 = 15.995 → 16
16imgs / 11.09s = 1.44 FPS
Is it really slow?? Why does it run slow? Drawing bbox on images or doing something? Why my GPU util is low??
a428tm
November 11, 2020, 5:31am
13
@Morganh and @xwater8
Curious what made you accept the last comment as the solution.
I am actually seeing the exact same speed when I am following the same tutorial.
Speed of FP16 model is -
29%|███████████▊ | 135/470 [22:53<56:47, 10.17s/it]
And based on the output of
nvidia-smi
This is what I am seeing (please see attached image) -
Did you just accept the fact that its inference is slow?
Would appreciate your feedback.
Thanks!
Morganh
November 11, 2020, 6:03am
14
See Probleme with training/pruning tlt - #8 by R.c
The tlt-infer will write bbox to images and also write label files.
But it does not mean the inference time.
For how to check the inference time, you can run trtexec.
Reference: Measurement model speed
a428tm
November 11, 2020, 6:08am
15
@Morganh
Saw that exact link just now. Will check that when I run it on TX2.
Appreciate it