Tf-trt Jetson Nano - process killed - conversion running out of memory?

Woutah · August 1, 2020, 11:10pm

I’m trying to convert my tf SavedModel using tf-trt converter on the Jetson Nano but the process is being killed without any apparent errors. Below the output:

Blockquote
2020-08-02 00:44:52.063615: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-02 00:44:57.741578: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-08-02 00:44:57.802627: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:44:57.802870: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2020-08-02 00:44:57.803003: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-02 00:44:58.019247: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-02 00:44:58.108270: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-02 00:44:58.232380: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-02 00:44:58.393156: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-02 00:44:58.485329: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-02 00:44:58.490884: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-02 00:44:58.491686: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:44:58.492479: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:44:58.492717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-08-02 00:44:59.044475: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7
2020-08-02 00:45:02.991144: W tensorflow/core/platform/profile_utils/cpu_utils.cc:106] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2020-08-02 00:45:02.992339: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3aaaebb0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-02 00:45:02.992391: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-08-02 00:45:03.116586: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:03.116872: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x39fa1d50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-02 00:45:03.116927: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2020-08-02 00:45:03.117415: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:03.117526: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2020-08-02 00:45:03.117605: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-02 00:45:03.117685: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-02 00:45:03.117739: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-02 00:45:03.117789: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-02 00:45:03.117837: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-02 00:45:03.117885: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-02 00:45:03.117931: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-02 00:45:03.118113: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:03.118318: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:03.118384: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-08-02 00:45:03.118483: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-02 00:45:15.792082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-02 00:45:15.792173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-08-02 00:45:15.792207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-08-02 00:45:15.792656: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:15.793024: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:15.793246: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 500 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2020-08-02 00:45:33.834607: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:33.834749: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-08-02 00:45:33.834986: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-08-02 00:45:33.836667: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:33.836801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2020-08-02 00:45:33.836892: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-02 00:45:33.837027: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-02 00:45:33.837143: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-02 00:45:33.837244: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-02 00:45:33.837315: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-02 00:45:33.837362: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-02 00:45:33.837419: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-02 00:45:33.837677: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:33.837956: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:33.838035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-08-02 00:45:33.838123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-02 00:45:33.838156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-08-02 00:45:33.838182: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-08-02 00:45:33.838528: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:33.838777: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:33.838922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 500 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2020-08-02 00:45:34.132239: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] Optimization results for grappler item: graph_to_optimize
2020-08-02 00:45:34.132327: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:826] function_optimizer: Graph size after: 232 nodes (185), 331 edges (284), time = 22.521ms.
2020-08-02 00:45:34.132359: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:826] function_optimizer: function_optimizer did nothing. time = 0.646ms.
2020-08-02 00:45:40.102243: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:40.102542: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-08-02 00:45:40.102823: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-08-02 00:45:40.104015: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:40.104190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2020-08-02 00:45:40.104628: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-02 00:45:40.104946: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-02 00:45:40.105013: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-02 00:45:40.105080: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-02 00:45:40.105198: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-02 00:45:40.105244: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-02 00:45:40.105318: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-02 00:45:40.105554: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:40.105818: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:40.105904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-08-02 00:45:40.106042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-02 00:45:40.106107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-08-02 00:45:40.106138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-08-02 00:45:40.106420: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:40.106733: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-02 00:45:40.106873: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 500 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
Killed

This time I tried limiting the memory allocation using

tf.config.experimental.set_virtual_device_configuration(gpus[0][tf.config.experimental.VirtualDeviceConfiguration(memory_limit=500)])

But to no avail… Other memory_limits don’t seem to work either. Is there anything I am missing? My file:

pb_to_tensor_rt (copy).txt (2.3 KB)

AastaLLL · August 3, 2020, 6:06am

Hi,

Would you mind to check system status with tegrastats at the same first?

$ sudo tegrastats

Usually, killed is caused by the out of memory.
If the minimal required memory is over the memory resource, this model may not be able to deploy on the Nano.

Thanks.

Woutah · August 3, 2020, 11:33pm

Thank you for your response, the system does indeed seem to run out of memory as you suggested. However, I was able to run the model using Tensorflow so I assumed a trt model would also be possible, or is this due to the conversion process taking up more memory?

Is there any indication as to what size is the maximum size (either number of parameters or some other metric) for a cnn that can be used on the Jetson Nano?

AastaLLL · August 4, 2020, 3:57am

Hi,

In TF-TRT, the memory tends to be twice since one for TensorFlow inference and the other for the TensorRT.

You can control the maximal memory usage of TensorRT via workspace variable directly.
Would you mind to update it into 32Mib to see if works first?

conversion_params = conversion_params._replace(
    max_workspace_size_bytes=(1<<25))

Thanks.

Woutah · August 8, 2020, 9:52pm

That seems to work, at least for the downscaled model, thank you for your help!

Topic		Replies	Views
TensorFlow GPU device created with only 1591MB memory (or is it 3.87GiB?), despite there being over 20GB available Jetson Nano tensorflow , tf-trt	2	2809	June 25, 2021
Memory Issues and Conversion issues with TF-TRT on Nano Jetson Nano tensorrt	8	1643	October 18, 2021
Device memory is insufficient to use tactic error when converting a model in SavedModel format to tensorrt model. Jetson Nano Jetson Nano tensorrt	3	2393	January 5, 2022
Jetson Nano Out of Memory running TRT Model Jetson Nano tensorrt , tensorflow , inference-server-triton , deepstream	5	2270	December 22, 2021
Run a UNet segmentation model on Jetson Nano / Convert pb to TensorRT Jetson Nano tensorrt	3	1869	October 18, 2021
Error Converting model to tensor RT Jetson Nano tensorrt , tensorflow	3	742	October 15, 2021
Tf-trt conversion got killed TensorRT tensorrt , tensorflow , jetson-inference	3	788	April 22, 2021
TensorRT process killed with Orin Nano Jetson Orin Nano tensorrt	9	1482	July 27, 2023
Jeston Nano 2GB Out of Memory With ONNX->TensorRT Conversion Jetson Nano tensorrt , nano2gb	2	1034	October 15, 2021
Trt_convert converter.convert() gets killed without errors Jetson Xavier NX tensorrt	8	2317	October 18, 2021

Tf-trt Jetson Nano - process killed - conversion running out of memory?

Related topics