Probleme with training/pruning tlt

R.c · September 16, 2020, 2:20pm

Hello everyone,
I am planning to use yolov3 with jetson NX for object detection (one classe for now).
After the training part, I’ve weird results with pruning, here my logs :

2020-09-14 07:59:02.401363: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
Using TensorFlow backend.
2020-09-14 07:59:08.733838: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-09-14 07:59:08.808197: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-14 07:59:08.809463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:01:00.0
2020-09-14 07:59:08.809513: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-14 07:59:08.862914: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-09-14 07:59:08.892049: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-09-14 07:59:08.903280: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-09-14 07:59:08.969313: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-09-14 07:59:09.014495: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-09-14 07:59:09.126092: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-14 07:59:09.126417: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-14 07:59:09.128208: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-14 07:59:09.129815: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-14 07:59:09.130756: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-14 07:59:11.048474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-14 07:59:11.048544: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-09-14 07:59:11.048571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-09-14 07:59:11.048980: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-14 07:59:11.051577: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-14 07:59:11.053667: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-14 07:59:11.055275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22504 MB memory) → physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-09-14 07:59:12,577 [INFO] modulus.pruning.pruning: Exploring graph for retainable indices
2020-09-14 07:59:15,553 [INFO] modulus.pruning.pruning: Pruning model and appending pruned nodes to new graph
2020-09-14 08:00:31,284 [INFO] iva.common.magnet_prune: Pruning ratio (pruned model / original model): 0.02986864959372703

Is the ratio 0.0298 regular?

After retraining my pruned model, I came to 2.5 it/s (batch 1 image) on a Titan RTX with fp32, and it’s really slow for my computer (Typical tensorflow inférenece was about 10 fps), and is the same if I choose to use fp16.
What have been doing wrong?

Morganh · September 17, 2020, 4:41am

“0.0298” is the prune ratio. It depends on how much you want to prune. With different prune ratio, end user can trigger experiments to find the best combination between mAP and fps.

When you were talking about 2.5its/, is it the result of running tlt-infer?

R.c · September 17, 2020, 7:30am

Yep but i thought it was really low (it means that a lot of the model was pruned right?)
I did leave initial spec on pruning, didn’t touch anything from the notebook (-pth 0.1)

My training and testing are still made on TITAN RTX.

Yes.

Morganh · September 17, 2020, 7:59am

Yes, the model size will be about 2.98% of unpruned model.
Are you training the public KITTI dataset or your own dataset? Actually you do not care much about the the default pth. Trying different pths or more retraining, end user can trigger experiments to find the best combination between mAP and fps.

For the 2.5it/s, I am checking it. Recently, two customers ask the same question about low speed of tlt-infer.

R.c · September 17, 2020, 8:08am

With that size I ll have thought it will be faster :D

I am using my own dataset. (50k annoted images total, 5h by epoch with titan rtx)

Thanks, remember that when I go from fp32 to fp16 the inference speed didnt evolve. It acted like it was capted.

Morganh · September 17, 2020, 9:32am

Seems that you misunderstand the time of tlt-infer. The tlt-infer will write bbox to images and also write label files.
It does not mean the inference time.

For how to check the inference time, you can run trtexec.
Reference: Measurement model speed - #4 by Morganh

With this way, the fp32 and fp16 should be different at inference speed.

R.c · September 17, 2020, 1:03pm

Ok Thanks, so what I understand is that the tlt-infer is not dependant of model quantization and pruning.

I have other question :
Non max suppression not seems to be implemented,
Yolo entry size, no option like 4164163 images or something else, where do we change that?
Yolo spec file specify output_width & heigths, what are they used for

Should I Open new subjects?

Morganh · September 18, 2020, 3:32pm

NMS is implemented in yolo_v3.
Where did you see " no option like 416 416 3 images"?
For output_width & heigths, see Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation

R.c · September 18, 2020, 3:49pm

Ok thanks, did not see that it was minimum 480. With the initial yolov3 and resnet backbone, there is only three size available, 416x416, 320x320, 608x608. I did not understand that you pushed ‘new values’ with that adaptation.

Morganh · September 18, 2020, 5:01pm

For yolo_v3, please see the requirement in tlt user guide.

YOLOv3

Input size: C * W * H (where C = 1 or 3, W >= 128, H >= 128, W, H are multiples of 32)
Image format: JPG, JPEG, PNG
Label format: KITTI detection

Topic		Replies	Views
Yolov3 slow inference TAO Toolkit	4	667	October 12, 2021
Accelerating Peoplnet with tlt for jetson nano TAO Toolkit	19	2590	October 12, 2021
Model pruning yolov3-tiny Jetson AGX Xavier yolo	5	707	October 18, 2021
TypeError: ‘NoneType’ object cannot be interpreted as an integer - pruning - yolov3 model TAO Toolkit tensorrt , deep-learning , tao , deepstream	5	1416	June 9, 2022
Should pruning a model prior to converting it to tensorRT make inference faster? Jetson TX2 tensorrt	12	2952	October 18, 2021
How to prune general yolov2 and yolov3 and yolov3 tiny models and use it for deepstream TAO Toolkit	10	1134	October 12, 2021
Model runs unefficienty after pruning Jetson AGX Xavier jetson-inference	13	463	October 18, 2021
TX2 "INT8 not supported by platform. Trying FP16 mode" TAO Toolkit	11	2851	October 12, 2021
Deepstream_lpr_app runs slowly TAO Toolkit	27	996	November 30, 2021
Pruning fasterrcnn : How does it affect the inference speed TAO Toolkit	2	436	October 12, 2021

Probleme with training/pruning tlt

Related topics