Hello,
I followed the instructions at section 3.5.5 of https://docs.nvidia.com/deeplearning/dgx/tf-trt-user-guide/index.html#using-metagraph-checkpoint to run image_classification.py and check the results of TensorRT graph in Tensorboard.
TensorBoard seems not to read/display the fused graph as the instruction explains.
Here are the steps:
check out GitHub - tensorflow/tensorrt: TensorFlow/TensorRT integration
convert the data dataset into TFRecord
python image_classification.py --model resnet_v1_50 --data_dir ~/dataset/faceTFR --use_trt --precision fp16 --mode validation
tensorboard --logdir=./data --port 6006
open tensorboard in browser and check ‘rasnet_model’ module. Hope to see the nodes reduced from 459 to 4.
Here are issues:
there is no ‘model_dir’ created by ‘image_classification’. So there is no logs when running Tensorboard on ‘model_dir’.
where running ‘TensorBoard’ on ‘data’, Tensorboard shows the original model as expected.
Is there anything overlooked in the above procedure?
thanks,
steven
Here are the test results:
python image_classification.py --model resnet_v1_50 --data_dir ~/dataset/faceTFR --use_trt --precision fp16
model: resnet_v1_50
model_dir: None
num_calib_inputs: 500
num_iterations: None
num_warmup_iterations: 50
precision: fp16
target_duration: None
use_synthetic: False
use_trt: True
use_trt_dynamic_op: False
url: http://download.tensorflow.org/models/official/20181001_resnet/checkpoints/resnet_imagenet_v1_fp32_20181001.tar.gz
num_nodes(native_tf): 741
num_nodes(tftrt_total): 474
num_nodes(trt_only): 0
graph_size(MB)(native_tf): 97.8
graph_size(MB)(trt): 195.6
time(s)(trt_conversion): 15.9
running inference…
Here are system setup:
TensorRT 5.0.2.6
Bazel 0.21.1
cuda 9.0
cudnn 7.3.5
facenet dataset
Geforce 1050
Tensorflow 1.13.1
Python 3.6
NVESJ
April 4, 2019, 6:26pm
4
Hello,
I tried to repro the bug you encountered on the tensorrt docker container and found no issue with the script. The “model_dir” folder will be generated after the inference is done and the Tensorboard events file is in the “eval” sub-folder inside the “model_dir” folder.
If you are running the example in your own TensorFlow environment, maybe some steps are missing or some packages are deprecated? We suggest you to use our docker containers since it would be slightly easier and you get everything up to date.
Thank you.
NVIDIA ENTERPRISE SUPPORT
Hi,
thanks for the reply.
which version of docker do you use? tensorrt or tensorflow?