Description
A clear and concise description of the bug or issue.
Environment
TensorRT Version: 7.1.3
GPU Type: AGX Xavier
Nvidia Driver Version:
CUDA Version: 10.2
CUDNN Version: 8.0
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): 2.4.0
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): JETPACK 4.5
I was using tf-trt to convert my saved_model(ssd_resnet50) trained from the host machine into trt engine in Xavier. I was using the attached script. trtop.py (576 Bytes)
After runing python3 trtop.py
, the command I encountered the following message:
2021-03-19 22:11:20.936508: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-03-19 22:11:30.623547: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libnvinfer.so.7
2021-03-21 21:03:25.999071: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-21 21:03:25.999438: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-03-21 21:03:26.149679: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:03:26.149923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 15.45GiB deviceMemoryBandwidth: 82.08GiB/s
2021-03-21 21:03:26.149992: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-03-21 21:03:26.150112: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-03-21 21:03:26.150184: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-03-21 21:03:26.237558: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-03-21 21:03:26.272863: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-03-21 21:03:26.318791: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-03-21 21:03:26.381611: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-03-21 21:03:26.381943: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-03-21 21:03:26.382198: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:03:26.382506: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:03:26.382651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1888] Adding visible gpu devices: 0
2021-03-21 21:03:26.384943: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-21 21:03:26.385234: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:03:26.385436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 15.45GiB deviceMemoryBandwidth: 82.08GiB/s
2021-03-21 21:03:26.385524: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-03-21 21:03:26.385593: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-03-21 21:03:26.385639: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-03-21 21:03:26.385684: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-03-21 21:03:26.385731: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-03-21 21:03:26.385773: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-03-21 21:03:26.385818: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-03-21 21:03:26.385860: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-03-21 21:03:26.386048: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:03:26.386255: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:03:26.386327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1888] Adding visible gpu devices: 0
2021-03-21 21:03:26.386441: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-03-21 21:03:32.315554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1287] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-21 21:03:32.315651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1293] 0
2021-03-21 21:03:32.315686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306] 0: N
2021-03-21 21:03:32.315977: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:03:32.316317: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:03:32.316476: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:03:32.316612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9990 MB memory) → physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2021-03-21 21:05:19.370619: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:05:19.371168: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2021-03-21 21:05:19.371768: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2021-03-21 21:05:19.372384: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-21 21:05:19.372707: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:05:19.372941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 15.45GiB deviceMemoryBandwidth: 82.08GiB/s
2021-03-21 21:05:19.373201: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-03-21 21:05:19.373373: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-03-21 21:05:19.373459: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-03-21 21:05:19.373564: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-03-21 21:05:19.373665: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-03-21 21:05:19.373738: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-03-21 21:05:19.373802: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-03-21 21:05:19.373860: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-03-21 21:05:19.374068: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:05:19.374212: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:05:19.374316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1888] Adding visible gpu devices: 0
2021-03-21 21:05:19.374470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1287] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-21 21:05:19.374503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1293] 0
2021-03-21 21:05:19.374530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306] 0: N
2021-03-21 21:05:19.374680: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:05:19.374857: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:05:19.375005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9990 MB memory) → physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2021-03-21 21:05:19.376628: W tensorflow/core/platform/profile_utils/cpu_utils.cc:116] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2021-03-21 21:05:22.152016: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:954] Optimization results for grappler item: graph_to_optimize
function_optimizer: Graph size after: 4997 nodes (4509), 10438 edges (9943), time = 538.579ms.
function_optimizer: function_optimizer did nothing. time = 5.175ms.
2021-03-21 21:05:57.064943: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:05:57.087998: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2021-03-21 21:05:57.103567: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2021-03-21 21:05:57.158908: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-21 21:05:57.159736: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:05:57.168780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 15.45GiB deviceMemoryBandwidth: 82.08GiB/s
2021-03-21 21:05:57.190819: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-03-21 21:05:57.292866: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-03-21 21:05:57.338429: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-03-21 21:05:57.386678: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-03-21 21:05:57.391626: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-03-21 21:05:57.405938: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-03-21 21:05:57.420817: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-03-21 21:05:57.435876: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-03-21 21:05:57.436637: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:05:57.437033: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:05:57.449504: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1888] Adding visible gpu devices: 0
2021-03-21 21:05:57.472591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1287] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-21 21:05:57.472731: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1293] 0
2021-03-21 21:05:57.472847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306] 0: N
2021-03-21 21:05:57.490212: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:05:57.490994: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-21 21:05:57.491447: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9990 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2021-03-21 21:06:21.153449: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:790] There are 619 ops of 52 different types in the graph that are not converted to TensorRT: Sum, GreaterEqual, Where, Reciprocal, ResizeBilinear, Split, Cast, StopGradient, Range, Less, Merge, TensorListGetItem, Pad, Slice, LogicalAnd, Mul, NextIteration, Switch, Select, Exit, LoopCond, Pack, NoOp, Size, Greater, GatherV2, ExpandDims, Identity, Assert, NonMaxSuppressionV5, Squeeze, Enter, TensorListFromTensor, AddV2, TensorListSetItem, Placeholder, TensorListStack, TensorListReserve, Const, Sub, Reshape, Transpose, Minimum, Shape, Maximum, StridedSlice, Fill, Unpack, ConcatV2, Exp, Equal, TopKV2, (For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops).
2021-03-21 21:06:23.661063: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:757] Number of TensorRT candidate segments: 4
Killed
I think the process is killed because of memory used up (15.4GB). If that’s the issue, how to allocate the memory to solve the problem? Otherwise, any ideas? Thanks.