Thank you for your reply. I follow the advise to use TensorRT-3.0.4 for Ubuntu 14.04 not for 16.04 and can get “PASS” result from running test_tftrt.py(tensorflow/test_tftrt.py at v1.8.0 · tensorflow/tensorflow · GitHub). But the run_all.sh from tftrt(https://developer.download.nvidia.com/devblogs/tftrt_sample.tar.xz) sample code still return error.
System: Ubuntu 18.04
TensorRT version: 3.0.4 (Ubuntu 14.04)
CUDA version: 9.0 (.deb)
cuDNN version: 7.1.3 (.deb)
TensorFlow version: 1.8.0
GPU: GTX 1080
Namespace(FP16=True, FP32=True, INT8=True, batch_size=4, dump_diff=False, native=True, num_loops=10, topN=5, update_graphdef=False, with_timeline=False, workspace_size=3072)
Starting at 2018-05-10 15:42:54.711629
2018-05-10 15:42:54.825034: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-10 15:42:54.825343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
totalMemory: 7.93GiB freeMemory: 7.44GiB
2018-05-10 15:42:54.825357: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-10 15:42:55.034163: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-10 15:42:55.034191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-05-10 15:42:55.034199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-05-10 15:42:55.034319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4059 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Starting execution
2018-05-10 15:42:55.627521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-10 15:42:55.627555: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-10 15:42:55.627560: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-05-10 15:42:55.627563: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-05-10 15:42:55.627645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4059 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Starting Warmup cycle
INFO:tensorflow:Warmup done. Starting real timing
iter 0 0.012434930801391601
iter 1 0.012465815544128418
iter 2 0.012448086738586425
iter 3 0.012464394569396972
iter 4 0.012447872161865235
iter 5 0.012483620643615722
iter 6 0.012439064979553223
iter 7 0.0124442720413208
iter 8 0.012428841590881347
iter 9 0.012441315650939942
Comparison= True
INFO:tensorflow:Timing loop done!
images/s : 321.3 +/- 0.4, s/batch: 0.01245 +/- 0.00002
RES, Native, 4, 321.29, 0.41, 0.01245, 0.00002
2018-05-10 15:43:03.037356: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2018-05-10 15:43:03.462730: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2660] Max batch size= 4 max workspace size= 3221225472
2018-05-10 15:43:03.462758: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2666] starting build engine
2018-05-10 15:43:15.239915: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger resources.cpp (199) - Cuda Error in gieCudaMalloc: 2
2018-05-10 15:43:15.247011: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger resources.cpp (199) - Cuda Error in gieCudaMalloc: 2
2018-05-10 15:43:15.247026: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2671] Built network
2018-05-10 15:43:15.247145: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:0 due to: "Internal: Engine building failure" SKIPPING......( 452 nodes)
INFO:tensorflow:Starting execution
2018-05-10 15:43:16.385523: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-10 15:43:16.385560: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-10 15:43:16.385565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-05-10 15:43:16.385568: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-05-10 15:43:16.385649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4059 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Starting Warmup cycle
INFO:tensorflow:Warmup done. Starting real timing
iter 0 0.012447414398193359
iter 1 0.012494730949401855
iter 2 0.01248936653137207
iter 3 0.012472720146179199
iter 4 0.012484326362609863
iter 5 0.012482056617736817
iter 6 0.01248654842376709
iter 7 0.012489418983459472
iter 8 0.012483220100402832
iter 9 0.012479052543640137
Comparison= True
INFO:tensorflow:Timing loop done!
images/s : 320.5 +/- 0.3, s/batch: 0.01248 +/- 0.00001
RES, TRT-FP32, 4, 320.49, 0.32, 0.01248, 0.00001
2018-05-10 15:43:23.911577: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2018-05-10 15:43:24.275253: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2660] Max batch size= 4 max workspace size= 3221225472
2018-05-10 15:43:24.275284: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2664] Using FP16 precision mode
2018-05-10 15:43:24.275288: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2666] starting build engine
2018-05-10 15:43:24.275693: W tensorflow/contrib/tensorrt/log/trt_logger.cc:34] DefaultLogger Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
2018-05-10 15:43:24.351625: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger reformat.cu (591) - Cuda Error in NCHWToNCHW: 2
2018-05-10 15:43:24.368325: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger reformat.cu (591) - Cuda Error in NCHWToNCHW: 2
2018-05-10 15:43:24.368343: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2671] Built network
2018-05-10 15:43:24.368473: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:0 due to: "Internal: Engine building failure" SKIPPING......( 452 nodes)
INFO:tensorflow:Starting execution
2018-05-10 15:43:25.431249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-10 15:43:25.431289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-10 15:43:25.431296: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-05-10 15:43:25.431302: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-05-10 15:43:25.431388: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4059 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Starting Warmup cycle
INFO:tensorflow:Warmup done. Starting real timing
iter 0 0.012494874000549317
iter 1 0.012518620491027832
iter 2 0.012508559226989745
iter 3 0.012507514953613281
iter 4 0.012525267601013183
iter 5 0.012517638206481933
iter 6 0.012493929862976073
iter 7 0.012491002082824706
iter 8 0.012517666816711426
iter 9 0.012507796287536621
Comparison= True
INFO:tensorflow:Timing loop done!
images/s : 319.8 +/- 0.3, s/batch: 0.01251 +/- 0.00001
RES, TRT-FP16, 4, 319.79, 0.29, 0.01251, 0.00001
2018-05-10 15:43:32.833938: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2018-05-10 15:43:33.172603: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2419] Max batch size= 4 max workspace size= 3221225472
2018-05-10 15:43:33.172755: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2446] finished op preparation
2018-05-10 15:43:33.172858: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2454] OK
2018-05-10 15:43:33.172864: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2455] finished op building
Running Calibration
INFO:tensorflow:Starting execution
2018-05-10 15:43:33.988248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-10 15:43:33.988285: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-10 15:43:33.988290: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-05-10 15:43:33.988293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-05-10 15:43:33.988375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4059 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Starting Warmup cycle
Cuda error in file src/winograd.cu at line 715: out of memory
python: customWinogradConvActLayer.cpp:280: virtual void nvinfer1::cudnn::WinogradConvActLayer::allocateResources(const nvinfer1::cudnn::CommonContext&): Assertion `convolutions.back().get()' failed.
./run_all.sh: line 13: 14247 Aborted (core dumped) python tftrt_sample.py --native --FP32 --FP16 --INT8 --num_loops 10 --topN 5 --batch_size 4 --workspace_size 3072 --log_file log.txt --network resnet_v1_50_frozen.pb --input_node input --output_nodes resnet_v1_50/predictions/Reshape_1 --img_size 224 --img_file grace_hopper.jpg