Converting tf model on jetson tx2 is slow

Description

I try to convert tensorflow classification model to tensorRT.
At first conversion process was killed (memory is full) and I increase swap file size.
After it started but is still running more them 2 hours and not finished now.
Memory filled up full and swap on 4.5 gb

My example based on:

print('Converting to TF-TRT FP16...')
conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS._replace(
    precision_mode=trt.TrtPrecisionMode.FP16,
    max_workspace_size_bytes=4000000000)
converter = trt.TrtGraphConverterV2(
   input_saved_model_dir='resnet50_saved_model', conversion_params=conversion_params)
converter.convert()
converter.save(output_saved_model_dir='resnet50_saved_model_TFTRT_FP16')
print('Done Converting to TF-TRT FP16')

Environment

GPU: jetson tx2
Operating System + Version: Jetpack 4.3

Can you try verbose logging in TRT and share the verbose log?
Also, please try to run the TF model directly to check the performance/memory consumption before conversion.

Thanks

Here is my console output. 1.txt (16.1 KB)
Conversion finished after more them 3 hours.

How to run model directly?

PS. on laptop example works well

It may be due to available GPU memory
Can you check system memory via “$ sudo tegrastats” to see if the reach the memory bound?

Thanks

I try to compole lightweight model (mobilenet) and it works well.

I check memory after first optimizer operation

RAM 7570/7861MB (lfb 8x4MB) SWAP 911/3930MB (cached 53MB) CPU [2%@345,1%@2029,93%@2028,0%@345,0%@345,1%@345] EMC_FREQ 1%@1866 GR3D_FREQ 0%@114 APE 150 MTS fg 0% bg 3% PLL@52C MCPU@52C PMIC@100C Tboard@47C GPU@49.5C BCPU@52C thermal@51.2C Tdiode@49.5C VDD_SYS_GPU 94/140 VDD_SYS_SOC 851/869 VDD_4V0_WIFI 0/10 VDD_IN 4641/4715 VDD_SYS_CPU 1324/1493 VDD_SYS_DDR 1303/1220

I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841] Optimization results for grappler item: graph_to_optimize
2020-06-08 11:10:11.369224: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843] function_optimizer: Graph size after: 3089 nodes (2543), 6797 edges (6249), time = 413.735ms.
2020-06-08 11:10:11.369284: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843] function_optimizer: function_optimizer did nothing. time = 7.225ms.

I try to resize swap to 9 gb, but its allocate all memory and swap.

Swap space cannot be used by TensorRT.
It seems model is using almost all the GPU memory, hence performance is slow.

Thanks

So, i have model which use near 4 Gb GPU memory. Its work fine on jetson tx2.
What happens when i try to convert it? its needed more memory? Stay larger? Why not enough memory?

Which way better? Convert on another device? ONNX?

Can you try TF → ONNX → TRT workflow?
You may have to create a custom layer for unsupported layer.

After generating ONNX model you can even use trtexec command line tool to quickly test and generate the TRT model
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thanks

So, i try onnx for yolo model convert (default example in tensorrt). It works ok. When i try to convert it from onnx to tensorrt with trtexec its give me memory error:

[06/18/2020-12:01:57] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
Killed

cmd:

./trtexec --onnx=/home/kuskovtx2/yolo_onnx/fp16/yolov3.onnx

I finally convert the model with sample from /usr/src/tensorrt/samples/python/yolov3_onnx. Its works ok.

My general question now why one example works for yolo and other not works? Maybe memory using difference?

Can you share the verbose log along with model so we can help better?

In this case,did you used just the “onnx_to_tensorrt.py” to convert your onnx model or you are referring to successful execution using complete sample code?
Can you compare both the onnx model to check the similarity?

Thanks

  1. For convert to onnx I use /usr/src/tensorrt/samples/python/yolov3_onnx.py
  2. For convert to tensorrt i use:
  • /usr/src/tensorrt/samples/python/onnx_to_tensorrt.py (work well)
  • ./trtexec --onnx=/home/kuskovtx2/yolo_onnx/fp16/yolov3.onnx (Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
    Killed) verbose log: verbose_log.txt (344.8 KB)

Could you please share the ONNX model so we can reproduce the issue?
Meanwhile, could you please try using latest TRT 7 release (Jetpack 4.4)?

Thanks

So, i’m use ./trtexec and its works. I’m just run my jetson without of display. Sometimes process use swap memory, but not more then 2 GB.

Onnx model here:
https://drive.google.com/file/d/10iNpiNmQrcC0NlbV1NGWq5jsgVxB2CBq/view?usp=sharing

Does this means issue resolved by running jetson without display?

Thanks

1 Like

Yes, the issue was resolved!