Got core dumped error when running tftrt sample code

I installed TensorRt 4…0.0.3 at first and got core dumped error when running run_all.sh in the tftrt sample. And then I switched to TensorRT 3.0.4-1+cuda9.0 and got same result. Here is the error code: /ops/_beam_search_ops.so./run_all.sh: line 13: 3347 Aborted (core dumped) python tftrt_sample.py --native --FP32 --FP16 --INT8 --num_loops 10 --topN 5 --batch_size 4 --workspace_size 2048 --log_file log.txt --network resnet_v1_50_frozen.pb --input_node input --output_nodes resnet_v1_50/predictions/Reshape_1 --img_size 224 --img_file grace_hopper.jpg.

System: Ubuntu 16.04
TensorRT version: 3.0.4-1+cuda9.0
CUDA version: 9.0
cuDNN version: 7.0.5
TensorFlow version: 1.7
GPU: GTX 1080ti

Hi I am able to run the code, but after “INFO:tensorflow:Timing loop done!” and “starting build engine” message I am getting core dump error. I did not change anything in given sample file “tftrt_sample.py and run_all.sh”. Can anyone help me? I am using TensorRT 4.0.0.3 version with tensorflow-gpu 1.7.0 and cuda V9.0.176. Thanks in Advance.

Got the same problem here, I also tried both tensorRT 3 and 4. Anyone got a clue?

System: Ubuntu 16.04
TensorRT version: 3.0.4-1+cuda9.0
CUDA version: 9.0
cuDNN version: 7.0.5
TensorFlow version: 1.8
GPU: Tesla K80

iter  8   0.0492392396927
iter  9   0.0493129205704
Comparison= True
INFO:tensorflow:Timing loop done!
images/s : 81.3 +/- 0.1, s/batch: 0.04922 +/- 0.00006
RES, Native, 4, 81.26, 0.10, 0.04922, 0.00006
2018-05-07 18:59:29.351872: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2018-05-07 18:59:30.462703: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2660] Max batch size= 4 max workspace size= 2147483648
2018-05-07 18:59:30.462973: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2666] starting build engine
*** Error in `python': munmap_chunk(): invalid pointer: 0x00007ffd14d253d0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fd245b3c7e5]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x1a8)[0x7fd245b49698]
/home/li/tf/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so(_ZNSt10_HashtableISsSsSaISsENSt8__detail9_IdentityESt8equal_toISsESt4hashISsENS1_18_Mod_range_hashingENS1_20_Default_ranged_hashENS1_20_Prime_rehash_policyENS1_17_Hashtable_traitsILb1ELb1ELb1EEEE21_M_insert_unique_nodeEmmPNS1_10_Hash_nodeISsLb1EEE+0xfc)[0x7fd21ab31d3c]
/usr/lib/x86_64-linux-gnu/libnvinfer.so.4(_ZNSt10_HashtableISsSsSaISsENSt8__detail9_IdentityESt8equal_toISsESt4hashISsENS1_18_Mod_range_hashingENS1_20_Default_ranged_hashENS1_20_Prime_rehash_policyENS1_17_Hashtable_traitsILb1ELb1ELb1EEEE9_M_insertIRKSsNS1_10_AllocNodeISaINS1_10_Hash_nodeISsLb1EEEEEEEESt4pairINS1_14_Node_iteratorISsLb1ELb1EEEbEOT_RKT0_St17integral_constantIbLb1EE+0x96)[0x7fd1dbb83a26]
/usr/lib/x86_64-linux-gnu/libnvinfer.so.4(_ZNK8nvinfer17Network8validateERKNS_5cudnn15HardwareContextEbbi+0x1a6)[0x7fd1dbb71b36]
/usr/lib/x86_64-linux-gnu/libnvinfer.so.4(_ZN8nvinfer17builder11buildEngineERNS_21CudaEngineBuildConfigERKNS_5cudnn15HardwareContextERKNS_7NetworkE+0x46)[0x7fd1dbb5e156]
/usr/lib/x86_64-linux-gnu/libnvinfer.so.4(_ZN8nvinfer17Builder15buildCudaEngineERNS_18INetworkDefinitionE+0x11)[0x7fd1dbb48e81]
/home/li/tf/local/lib/python2.7/site-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so(_ZN10tensorflow8tensorrt7convert32ConvertSubGraphToTensorRTNodeDefERNS1_14SubGraphParamsE+0x1b2b)[0x7fd1db3de3bb]
/home/li/tf/local/lib/python2.7/site-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so(+0x5a49a)[0x7fd1db3bb49a]

7fd2462ce000-7fd2462cf000 rw-s 00000000 00:06 371                        /dev/nvidiactl
7fd2462cf000-7fd2462d0000 r--s 00000000 00:06 372                        /dev/nvidia0
7fd2462d0000-7fd2462d1000 rwxp 00000000 00:00 0 
7fd2462d1000-7fd2462d2000 r--p 00025000 08:01 1964                       /lib/x86_64-linux-gnu/ld-2.23.so
7fd2462d2000-7fd2462d3000 rw-p 00026000 08:01 1964                       /lib/x86_64-linux-gnu/ld-2.23.so
7fd2462d3000-7fd2462d4000 rw-p 00000000 00:00 0 
7ffd14d09000-7ffd14d2a000 rw-p 00000000 00:00 0                          [stack]
7ffd14db2000-7ffd14db5000 r--p 00000000 00:00 0                          [vvar]
7ffd14db5000-7ffd14db7000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
./run_all.sh: line 13: 17730 Aborted                 (core dumped) python tftrt_sample.py --native --FP32 --FP16 --INT8 --num_loops 10 --topN 5 --batch_size 4 --workspace_size 2048 --log_file log.txt --network resnet_v1_50_frozen.pb --input_node input --output_nodes resnet_v1_50/predictions/Reshape_1 --img_size 224 --img_file grace_hopper.jpg
(tf) root@gpu-test2:/home/li/tftrt#

Got the same problem here, I also tried TensorRT 4. Anyone got a clue?

System: Ubuntu 18.04
TensorRT version: 4.0.0.3
CUDA version: 9.0
cuDNN version: 7.1.3
TensorFlow version: 1.8.0
GPU: GTX 1080

INFO:tensorflow:Starting Warmup cycle
INFO:tensorflow:Warmup done. Starting real timing
iter  0   0.012457432746887208
iter  1   0.012475056648254395
iter  2   0.012449536323547363
iter  3   0.012465896606445313
iter  4   0.012452802658081054
iter  5   0.012475709915161132
iter  6   0.012458248138427734
iter  7   0.012449665069580078
iter  8   0.012447819709777833
iter  9   0.01242135524749756
Comparison= True
INFO:tensorflow:Timing loop done!
images/s : 321.1 +/- 0.4, s/batch: 0.01246 +/- 0.00001
RES, Native, 4, 321.15, 0.38, 0.01246, 0.00001
2018-05-08 10:05:10.743348: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2018-05-08 10:05:11.184598: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2660] Max batch size= 4 max workspace size= 2147483648
2018-05-08 10:05:11.184635: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2666] starting build engine
munmap_chunk(): invalid pointer
Aborted (core dumped)

I’ve reported this bug and still waiting for the response.

Hello,

If you are using the pip installed tensorflow-gpu package, you will need to use Ubuntu 14.04 pacakge for TensorRT3.0.4 regardless of whether you are using Ubuntu 14.04, 16.04 or 18.04. 18.04 is not tested so it is likely to fail. It may be improved with TensorRT4.0 release but not guaranteed.

Could you please try ubuntu14.04 version of TensorRT3.0.4, making sure that other versions are completely removed and let us know.

Thanks,
Sami

Thank you for your reply. I follow the advise to use TensorRT-3.0.4 for Ubuntu 14.04 not for 16.04 and can get “PASS” result from running test_tftrt.py(https://github.com/tensorflow/tensorflow/blob/v1.8.0/tensorflow/contrib/tensorrt/test/test_tftrt.py). But the run_all.sh from tftrt(https://developer.download.nvidia.com/devblogs/tftrt_sample.tar.xz) sample code still return error.

System: Ubuntu 18.04
TensorRT version: 3.0.4 (Ubuntu 14.04)
CUDA version: 9.0 (.deb)
cuDNN version: 7.1.3 (.deb)
TensorFlow version: 1.8.0
GPU: GTX 1080

Namespace(FP16=True, FP32=True, INT8=True, batch_size=4, dump_diff=False, native=True, num_loops=10, topN=5, update_graphdef=False, with_timeline=False, workspace_size=3072)
Starting at 2018-05-10 15:42:54.711629
2018-05-10 15:42:54.825034: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-10 15:42:54.825343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
totalMemory: 7.93GiB freeMemory: 7.44GiB
2018-05-10 15:42:54.825357: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-10 15:42:55.034163: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-10 15:42:55.034191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-05-10 15:42:55.034199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-05-10 15:42:55.034319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4059 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Starting execution
2018-05-10 15:42:55.627521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-10 15:42:55.627555: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-10 15:42:55.627560: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-05-10 15:42:55.627563: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-05-10 15:42:55.627645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4059 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Starting Warmup cycle
INFO:tensorflow:Warmup done. Starting real timing
iter  0   0.012434930801391601
iter  1   0.012465815544128418
iter  2   0.012448086738586425
iter  3   0.012464394569396972
iter  4   0.012447872161865235
iter  5   0.012483620643615722
iter  6   0.012439064979553223
iter  7   0.0124442720413208
iter  8   0.012428841590881347
iter  9   0.012441315650939942
Comparison= True
INFO:tensorflow:Timing loop done!
images/s : 321.3 +/- 0.4, s/batch: 0.01245 +/- 0.00002
RES, Native, 4, 321.29, 0.41, 0.01245, 0.00002
2018-05-10 15:43:03.037356: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2018-05-10 15:43:03.462730: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2660] Max batch size= 4 max workspace size= 3221225472
2018-05-10 15:43:03.462758: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2666] starting build engine
2018-05-10 15:43:15.239915: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger resources.cpp (199) - Cuda Error in gieCudaMalloc: 2
2018-05-10 15:43:15.247011: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger resources.cpp (199) - Cuda Error in gieCudaMalloc: 2
2018-05-10 15:43:15.247026: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2671] Built network
2018-05-10 15:43:15.247145: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:0 due to: "Internal: Engine building failure" SKIPPING......( 452 nodes)
INFO:tensorflow:Starting execution
2018-05-10 15:43:16.385523: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-10 15:43:16.385560: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-10 15:43:16.385565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-05-10 15:43:16.385568: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-05-10 15:43:16.385649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4059 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Starting Warmup cycle
INFO:tensorflow:Warmup done. Starting real timing
iter  0   0.012447414398193359
iter  1   0.012494730949401855
iter  2   0.01248936653137207
iter  3   0.012472720146179199
iter  4   0.012484326362609863
iter  5   0.012482056617736817
iter  6   0.01248654842376709
iter  7   0.012489418983459472
iter  8   0.012483220100402832
iter  9   0.012479052543640137
Comparison= True
INFO:tensorflow:Timing loop done!
images/s : 320.5 +/- 0.3, s/batch: 0.01248 +/- 0.00001
RES, TRT-FP32, 4, 320.49, 0.32, 0.01248, 0.00001
2018-05-10 15:43:23.911577: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2018-05-10 15:43:24.275253: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2660] Max batch size= 4 max workspace size= 3221225472
2018-05-10 15:43:24.275284: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2664] Using FP16 precision mode
2018-05-10 15:43:24.275288: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2666] starting build engine
2018-05-10 15:43:24.275693: W tensorflow/contrib/tensorrt/log/trt_logger.cc:34] DefaultLogger Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
2018-05-10 15:43:24.351625: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger reformat.cu (591) - Cuda Error in NCHWToNCHW: 2
2018-05-10 15:43:24.368325: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger reformat.cu (591) - Cuda Error in NCHWToNCHW: 2
2018-05-10 15:43:24.368343: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2671] Built network
2018-05-10 15:43:24.368473: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:0 due to: "Internal: Engine building failure" SKIPPING......( 452 nodes)
INFO:tensorflow:Starting execution
2018-05-10 15:43:25.431249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-10 15:43:25.431289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-10 15:43:25.431296: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-05-10 15:43:25.431302: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-05-10 15:43:25.431388: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4059 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Starting Warmup cycle
INFO:tensorflow:Warmup done. Starting real timing
iter  0   0.012494874000549317
iter  1   0.012518620491027832
iter  2   0.012508559226989745
iter  3   0.012507514953613281
iter  4   0.012525267601013183
iter  5   0.012517638206481933
iter  6   0.012493929862976073
iter  7   0.012491002082824706
iter  8   0.012517666816711426
iter  9   0.012507796287536621
Comparison= True
INFO:tensorflow:Timing loop done!
images/s : 319.8 +/- 0.3, s/batch: 0.01251 +/- 0.00001
RES, TRT-FP16, 4, 319.79, 0.29, 0.01251, 0.00001
2018-05-10 15:43:32.833938: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2018-05-10 15:43:33.172603: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2419] Max batch size= 4 max workspace size= 3221225472
2018-05-10 15:43:33.172755: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2446] finished op preparation
2018-05-10 15:43:33.172858: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2454] OK
2018-05-10 15:43:33.172864: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2455] finished op building
Running Calibration
INFO:tensorflow:Starting execution
2018-05-10 15:43:33.988248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-10 15:43:33.988285: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-10 15:43:33.988290: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-05-10 15:43:33.988293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-05-10 15:43:33.988375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4059 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Starting Warmup cycle
Cuda error in file src/winograd.cu at line 715: out of memory
python: customWinogradConvActLayer.cpp:280: virtual void nvinfer1::cudnn::WinogradConvActLayer::allocateResources(const nvinfer1::cudnn::CommonContext&): Assertion `convolutions.back().get()' failed.
./run_all.sh: line 13: 14247 Aborted                 (core dumped) python tftrt_sample.py --native --FP32 --FP16 --INT8 --num_loops 10 --topN 5 --batch_size 4 --workspace_size 3072 --log_file log.txt --network resnet_v1_50_frozen.pb --input_node input --output_nodes resnet_v1_50/predictions/Reshape_1 --img_size 224 --img_file grace_hopper.jpg

My error just due to the lack of GPU memory, so modify the run_all.sh to run only one precision mode or reduce the batch size can help to solve the problem.

Now, I can ensure that TensorRT 3.0.4 for Ubuntu 14.04 works fine on Ubuntu 18.04 with TensorFlow 1.8.0!
When TensorRT 4 RC can get the same result?