Hello, I have use TF-TRT to optimize my Retinaet(Backbone:resnet50) model
and When I use the model to do the inference which the input shape is (1, 1408, 960, 1) the error happen:
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,176,120,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node res3d_branch2c/convolution-0-0-TransposeNCHWToNHWC-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[filtered_detections/map/TensorArrayStack/TensorArrayGatherV3/_39]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
the error log:
2019-12-26 18:13:16.565362: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for TRTEngineOp_40 input shapes: [[1,1408,960,1]]
2019-12-26 18:13:16.565457: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.5
2019-12-26 18:13:16.571872: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.5
2019-12-26 18:13:17.598781: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for gn3c_branch2b/TRTEngineOp_99 input shapes: [[1,32,176,120,4]]
2019-12-26 18:14:15.821461: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for gn3c_branch2b/TRTEngineOp_100 input shapes: [[1,32,176,120,4]]
.
.
.
.
.
2019-12-26 18:14:15.838879: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for TRTEngineOp_198 input shapes: [[1,176,120,128]]
2019-12-26 18:14:16.723495: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for gn3c_branch2c/TRTEngineOp_101 input shapes: [[1,32,176,120,16]]
2019-12-26 18:14:27.924888: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 41.25MiB (rounded to 43254272). Current allocation summary follows.
2019-12-26 18:14:27.924984: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (256): Total Chunks: 9, Chunks in use: 8. 2.2KiB allocated for chunks. 2.0KiB in use in bin. 29B client-requested in use in bin.
2019-12-26 18:14:27.925024: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (512): Total Chunks: 21, Chunks in use: 18. 15.8KiB allocated for chunks. 13.5KiB in use in bin. 9.8KiB client-requested in use in bin.
2019-12-26 18:14:27.925056: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1024): Total Chunks: 20, Chunks in use: 18. 25.5KiB allocated for chunks. 23.2KiB in use in bin. 19.6KiB client-requested in use in bin.
2019-12-26 18:14:27.925085: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2048): Total Chunks: 6, Chunks in use: 6. 15.0KiB allocated for chunks. 15.0KiB in use in bin. 14.0KiB client-requested in use in bin.
2019-12-26 18:14:27.925112: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4096): Total Chunks: 3, Chunks in use: 3. 13.5KiB allocated for chunks. 13.5KiB in use in bin. 13.5KiB client-requested in use in bin.
2019-12-26 18:14:27.925136: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8192): Total Chunks: 1, Chunks
.
.
.
Chunks in use: 3. 69.88MiB allocated for chunks. 30.94MiB in use in bin. 30.94MiB client-requested in use in bin.
2019-12-26 18:14:27.925436: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16777216): Total Chunks: 19, Chunks in use: 19. 412.13MiB allocated for chunks. 412.13MiB in use in bin. 352.49MiB client-requested in use in bin.
2019-12-26 18:14:27.925467: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (33554432): Total Chunks: 22, Chunks in use: 21. 980.79MiB allocated for chunks. 939.54MiB in use in bin. 805.67MiB client-requested in use in bin.
2019-12-26 18:14:27.925497: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (67108864): Total Chunks: 11, Chunks in use: 11. 990.00MiB allocated for chunks. 990.00MiB in use in bin. 784.34MiB client-requested in use in bin.
2019-12-26 18:14:27.925524: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (134217728): Total Chunks: 8, Chunks in use: 8. 1.31GiB allocated for chunks. 1.31GiB in use in bin. 1.21GiB client-requested in use in bin.
2019-12-26 18:14:27.925553: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (268435456): Total Chunks: 1, Chunks in use: 1. 256.00MiB allocated for chunks. 256.00MiB in use in bin. 165.00MiB client-requested in use in bin.
2019-12-26 18:14:27.925581: I tensorflow/core/common_runtime/bfc_allocator.cc:885] Bin for 41.25MiB was 32.00MiB, Chunk State:
2019-12-26 18:14:27.925620: I tensorflow/core/common_runtime/bfc_allocator.cc:891] Size: 41.25MiB | Requested Size: 41.25MiB | in_use: 0 | bin_num: 17, prev: Size: 20.62MiB | Requested Size: 20.62MiB | in_use: 1 | bin_num: -1, next: Size: 41.25MiB | Requested Size: 41.25MiB | in_use: 1 | bin_num: -1
2019-12-26 18:14:27.925644: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 2070675456
2019-12-26 18:14:27.925669: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7ef9ec000000 next 64 of size 173016064
2019-12-26 18:14:27.925691: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7ef9f6500400 next 58 of size 86507520
2019-12-26 18:14:27.925712: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7ef9fb780400 next 72 of size 86508032
.
.
.
43253760
2019-12-26 18:14:27.926438: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efa61555800 next 135 of size 43254272
2019-12-26 18:14:27.926460: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efa63e95a00 next 18446744073709551615 of size 58893824
2019-12-26 18:14:27.926480: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 1073741824
2019-12-26 18:14:27.926501: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efa6c000000 next 33 of size 173016064
2019-12-26 18:14:27.926521: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efa76500400 next 52 of size 21626880
2019-12-26 18:14:27.926542: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efa779a0400 next 41 of size 64880640
2019-12-26 18:14:27.926562: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efa7b780400 next 78 of size 21627392
2019-12-26 18:14:27.926583: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efa7cc20600 next 61 of size 86507008
.
.
.
size 148224
2019-12-26 18:14:27.929580: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efbcfa94f00 next 82 of size 148224
2019-12-26 18:14:27.929601: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free at 0x7efbcfab9200 next 18446744073709551615 of size 290304
2019-12-26 18:14:27.929621: I tensorflow/core/common_runtime/bfc_allocator.cc:914] Summary of in-use Chunks by size:
330.00MiB
.
.
.
.
2019-12-26 18:14:27.932365: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 86508544 totalling 165.00MiB
2019-12-26 18:14:27.932403: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 129761280 totalling 247.50MiB
2019-12-26 18:14:27.932427: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 134217728 totalling 128.00MiB
.
.
.
.
.
21627392
2019-12-26 18:14:37.955693: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efae27b4e00 next 18446744073709551615 of size 25473536
2019-12-26 18:14:37.955713: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 134217728
2019-12-26 18:14:37.955734: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efae4000000 next 18446744073709551615 of size 134217728
2019-12-26 18:14:37.955754: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 16777216
2019-12-26 18:14:37.955775: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efaf2000000 next 18446744073709551615 of size 16777216
2019-12-26 18:14:37.955796: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 8388608
2019-12-26 18:14:37.955817: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efb22000000 next 81 of size 573184
2019-12-26 18:14:37.955838: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free at 0x7efb2208bf00 next 98 of size 651776
2019-12-26 18:14:37.955858: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efb2212b100 next 96 of size 651776
2019-12-26 18:14:37.955879: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efb221ca300 next 142 of size 389632
2019-12-26 18:14:37.955900: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efb22229500 next 29 of size 1116416
2019-12-26 18:14:37.955921: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efb22339e00 next 31 of size 525056
2019-12-26 18:14:37.955941: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efb223ba100 next 59 of size 573184
2019-12-26 18:14:37.955962: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efb22446000 next 124 of size 263168
2019-12-26 18:14:37.955982: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efb22486400 next 69 of size 310016
2019-12-26 18:14:37.956002: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efb224d1f00 next 70 of size 573184
2019-12-26 18:14:37.956023: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efb2255de00 next 100 of size 508672
2019-12-26 18:14:37.956094: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efb225da100 next 104 of size 772352
2019-12-26 18:14:37.956117: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efb22696a00 next 112 of size 590848
2019-12-26 18:14:37.956138: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7efb22726e00 next 18446744073709551615 of size 889344
2019-12-26 18:14:37.956158: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 8388608
2019-12-26 18:14:37.956180: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free at 0x7efb23200000 next 184467440737
.
.
.
2019-12-26 18:15:11.114549: I tensorflow/core/common_runtime/bfc_allocator.cc:921] Sum Total of in-use chunks: 3.93GiB
2019-12-26 18:15:11.114553: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 4252762112 memory_limit_: 4252762112 available bytes: 0 curr_region_allocation_bytes_: 4294967296
2019-12-26 18:15:11.114559: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats:
Limit: 4252762112
InUse: 4222795776
MaxInUse: 4233998848
NumAllocs: 5484
MaxAllocSize: 268435456
2019-12-26 18:15:11.114577: W tensorflow/core/common_runtime/bfc_allocator.cc:424] ***********************************************************************************x
2019-12-26 18:15:11.114599: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at transpose_op.cc:198 : Resource exhausted: OOM when allocating tensor with shape[1,176,120,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2019-12-26 18:15:11.114658: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at trt_engine_op.cc:318 : Resource exhausted: OOM when allocating tensor with shape[1,176,120,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
I want to know why this issue happen and what should I do
my computer is P4000
cuda10.0 cudnn7.6.0
tensorflow1.13.1 tensorrt5.0
and my code to build TF-TRT model:
trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,# frozen model
outputs=your_outputs,
max_batch_size=1,
is_dynamic_op=True,
precision_mode=“FP32”,
minimum_segment_size=1)
#write the TensorRT model to be used later for inference
with gfile.FastGFile(“/home/user/PycharmProjects/release/resident_card_guo_test.trt”,‘wb’) as f:
f.write(trt_graph.SerializeToString())
print(“TensorRT model is successfully stored!”)
numb. of all_nodes in frozen graph: 3955
numb. of trt_engine_nodes in TensorRT graph: 204
numb. of all_nodes in TensorRT graph: 2233
thank you so much !