Tf-trt Jetson Nano - process killed - conversion running out of memory?

I’m trying to convert my tf SavedModel using tf-trt converter on the Jetson Nano but the process is being killed without any apparent errors. Below the output:

Blockquote
2020-08-02 00:44:52.063615: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-02 00:44:57.741578: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-08-02 00:44:57.802627: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:44:57.802870: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2020-08-02 00:44:57.803003: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-02 00:44:58.019247: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-02 00:44:58.108270: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-02 00:44:58.232380: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-02 00:44:58.393156: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-02 00:44:58.485329: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-02 00:44:58.490884: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-02 00:44:58.491686: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:44:58.492479: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:44:58.492717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-08-02 00:44:59.044475: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7
2020-08-02 00:45:02.991144: W tensorflow/core/platform/profile_utils/cpu_utils.cc:106] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2020-08-02 00:45:02.992339: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3aaaebb0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-02 00:45:02.992391: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-08-02 00:45:03.116586: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:03.116872: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x39fa1d50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-02 00:45:03.116927: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2020-08-02 00:45:03.117415: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:03.117526: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2020-08-02 00:45:03.117605: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-02 00:45:03.117685: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-02 00:45:03.117739: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-02 00:45:03.117789: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-02 00:45:03.117837: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-02 00:45:03.117885: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-02 00:45:03.117931: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-02 00:45:03.118113: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:03.118318: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:03.118384: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-08-02 00:45:03.118483: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-02 00:45:15.792082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-02 00:45:15.792173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-08-02 00:45:15.792207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-08-02 00:45:15.792656: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:15.793024: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:15.793246: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 500 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2020-08-02 00:45:33.834607: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:33.834749: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-08-02 00:45:33.834986: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-08-02 00:45:33.836667: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:33.836801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2020-08-02 00:45:33.836892: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-02 00:45:33.837027: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-02 00:45:33.837143: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-02 00:45:33.837244: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-02 00:45:33.837315: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-02 00:45:33.837362: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-02 00:45:33.837419: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-02 00:45:33.837677: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:33.837956: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:33.838035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-08-02 00:45:33.838123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-02 00:45:33.838156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-08-02 00:45:33.838182: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-08-02 00:45:33.838528: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:33.838777: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:33.838922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 500 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2020-08-02 00:45:34.132239: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] Optimization results for grappler item: graph_to_optimize
2020-08-02 00:45:34.132327: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:826] function_optimizer: Graph size after: 232 nodes (185), 331 edges (284), time = 22.521ms.
2020-08-02 00:45:34.132359: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:826] function_optimizer: function_optimizer did nothing. time = 0.646ms.
2020-08-02 00:45:40.102243: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:40.102542: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-08-02 00:45:40.102823: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-08-02 00:45:40.104015: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:40.104190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2020-08-02 00:45:40.104628: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-02 00:45:40.104946: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-02 00:45:40.105013: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-02 00:45:40.105080: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-02 00:45:40.105198: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-02 00:45:40.105244: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-02 00:45:40.105318: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-02 00:45:40.105554: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:40.105818: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:40.105904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-08-02 00:45:40.106042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-02 00:45:40.106107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-08-02 00:45:40.106138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-08-02 00:45:40.106420: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:40.106733: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:40.106873: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 500 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
Killed

This time I tried limiting the memory allocation using

tf.config.experimental.set_virtual_device_configuration(gpus[0][tf.config.experimental.VirtualDeviceConfiguration(memory_limit=500)])

But to no avail… Other memory_limits don’t seem to work either. Is there anything I am missing? My file:

pb_to_tensor_rt (copy).txt (2.3 KB)

Hi,

Would you mind to check system status with tegrastats at the same first?

$ sudo tegrastats

Usually, killed is caused by the out of memory.
If the minimal required memory is over the memory resource, this model may not be able to deploy on the Nano.

Thanks.

Thank you for your response, the system does indeed seem to run out of memory as you suggested. However, I was able to run the model using Tensorflow so I assumed a trt model would also be possible, or is this due to the conversion process taking up more memory?

Is there any indication as to what size is the maximum size (either number of parameters or some other metric) for a cnn that can be used on the Jetson Nano?

Hi,

In TF-TRT, the memory tends to be twice since one for TensorFlow inference and the other for the TensorRT.

You can control the maximal memory usage of TensorRT via workspace variable directly.
Would you mind to update it into 32Mib to see if works first?

conversion_params = conversion_params._replace(
    max_workspace_size_bytes=(1<<25))

Thanks.

That seems to work, at least for the downscaled model, thank you for your help!