Freeze while executing Tensorflow in a Docker container on the TX2

matthieu.boujonnier · March 7, 2018, 3:25pm

Hi,

Thanks to open horizon, I was able to install docker with GPU support and run DIGITS in a container.
Then, next step, I wanted to run a simple tensorflow (Thanks furkankalinsaz ! Tensorflow 1.6 for Jetson TX2 - Jetson TX2 - NVIDIA Developer Forums) script in such a container.

But it looks like the base container from open horizon https://github.com/open-horizon/cogwerx-jetson-tx2/blob/master/Dockerfile.cudabase requires something more (that the JetPack 3.2 installer does) as my tensorflow application freezes at init in the container.

Outside the container:

(tf) nvidia@tegra-ubuntu:~/projects/tensorflow$ python hello.py
2018-03-07 09:39:05.245311: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-03-07 09:39:05.245522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 156.65MiB
2018-03-07 09:39:05.245575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-03-07 09:39:06.814190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-03-07 09:39:06.814309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0
2018-03-07 09:39:06.814343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N
2018-03-07 09:39:06.814689: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
b'Hello, TensorFlow!'

Inside the container:

nvidia@tegra-ubuntu:~/projects/realift/src/aml$ docker run --privileged --name tf -it tensorflow:tx2 /bin/bash
root@e02ee5b67a48:/app# python3 hello.py
2018-03-07 15:00:11.427965: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-03-07 15:00:11.428192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 1.28GiB
2018-03-07 15:00:11.428266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0

then the container is stuck.

I add the exact same issue when installing manually the CUDA and CuDNN packages on the TX2. Applying the JetPack 3.2 installer solved the issue.

Is there somebody who can tell what is missing in the base image https://github.com/open-horizon/cogwerx-jetson-tx2/blob/master/Dockerfile.cudabase ?

Thanks !

Here is my test Dockerfile:

FROM openhorizon/aarch64-tx2-cudabase:JetPack3.2-RC
ENV ARCH=aarch64
RUN apt-get update && apt-get install -y --no-install-recommends --no-install-suggests python3-minimal python3-pip libpython3.5-dev
# Custom layers

# install ubuntu python releases
RUN apt-get install -y --no-install-recommends --no-install-suggests build-essential
RUN apt-get install -y --no-install-recommends --no-install-suggests python3-setuptools python3-all-dev python3-dev

# get precompiled TF 1.6 for JetPack 3.2 RC
RUN apt-get install -y --no-install-recommends --no-install-suggests wget
RUN wget https://github.com/openzeka/Tensorflow-for-Jetson-TX2/raw/master/Jetpack-3.2/1.6/tensorflow-1.6.0rc1-cp35-cp35m-linux_aarch64.whl
RUN pip3 install tensorflow-1.6.0rc1-cp35-cp35m-linux_aarch64.whl

WORKDIR /app
# hello.py is the TF validation script
ADD hello.py /app/
CMD [ "/usr/bin/python3", "/app/hello.py"]

matthieu.boujonnier · March 7, 2018, 4:13pm

Ok I spied on JetPack and it looks like it also installs :

libgomp1 libfreeimage-dev libopenmpi-dev openmpi-bin

Maybe it will solve the pb…

matthieu.boujonnier · March 7, 2018, 4:41pm

After some tests in fact it seems it can take A LOT of time, the FIRST time :

root@7922e0755c22:/app# python3 hello.py
2018-03-07 16:21:58.038844: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-03-07 16:21:58.039099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 688.65MiB
2018-03-07 16:21:58.039164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-03-07 16:29:16.860720: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-03-07 16:29:16.860824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0
2018-03-07 16:29:16.860865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N
2018-03-07 16:29:16.861126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 133 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
b'Hello, TensorFlow!'
root@7922e0755c22:/app# python3 hello.py
2018-03-07 16:43:06.814518: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-03-07 16:43:06.814760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 51.09MiB
2018-03-07 16:43:06.814821: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-03-07 16:43:08.441989: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-03-07 16:43:08.442148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0
2018-03-07 16:43:08.442201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N
2018-03-07 16:43:08.442411: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 41 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
b'Hello, TensorFlow!'
root@7922e0755c22:/app# python3 hello.py
2018-03-07 16:43:27.350149: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-03-07 16:43:27.350347: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 140.93MiB
2018-03-07 16:43:27.350414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-03-07 16:43:28.848243: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-03-07 16:43:28.848583: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0
2018-03-07 16:43:28.848648: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N
2018-03-07 16:43:28.848884: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 38 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
b'Hello, TensorFlow!'
root@7922e0755c22:/app# python3 hello.py
2018-03-07 16:43:36.037751: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-03-07 16:43:36.038490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 343.00MiB
2018-03-07 16:43:36.038572: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-03-07 16:43:37.462295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-03-07 16:43:37.462490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0
2018-03-07 16:43:37.462531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N
2018-03-07 16:43:37.462699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 143 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
b'Hello, TensorFlow!'
root@7922e0755c22:/app# python3 hello.py
2018-03-07 16:43:57.689202: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-03-07 16:43:57.689411: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 307.52MiB
2018-03-07 16:43:57.689484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-03-07 16:43:59.125303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-03-07 16:43:59.125421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0
2018-03-07 16:43:59.125461: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N
2018-03-07 16:43:59.125714: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 208 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
b'Hello, TensorFlow!'

8 minutes between Adding visible gpu devices: 0 and Device interconnect StreamExecutor with strength 1 edge matrix. Then all next executions were immediate.

Any reason ?

matthieu.boujonnier · March 7, 2018, 7:52pm

Just got a build of TF by NVIDIA and the lag disappeared.
Pb solved!

saikishor · May 8, 2018, 6:33am

Cool!! it worked for me as well. Thanks for sharing it @matthieu.boujonnier, but that tf .whl file is a RC version. Is there any production version of tf, without this issue?. Did you come across such wheel file?.

AastaLLL · May 11, 2018, 3:15am

Hi, saikishor

Here is some public TensorFlow wheel for Jetson for your reference:

Thanks.

dr3dd · June 7, 2018, 8:21am

Could you please provide the exact links to the wheel file that you’ve used? I downloaded the wheel files from Box , but that doesn’t do the trick for me… My Tensorflow program hangs at “Adding visible gpu devices: 0”. It takes about 8 minutes for my Tensorflow program to start, see the following logs:

2018-06-07 08:11:56.680367: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-06-07 08:11:56.680703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 4.18GiB
2018-06-07 08:11:56.680797: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-06-07 08:19:16.380270: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-07 08:19:16.380341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-06-07 08:19:16.380366: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-06-07 08:19:16.380579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:worker/replica:0/task:0/device:GPU:0 with 2356 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-06-07 08:19:16.930939: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> localhost:8080, 1 -> worker-1.default.svc.cluster.local:8080, 2 -> worker-2.default.svc.cluster.local:8080}
2018-06-07 08:19:16.931884: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:332] Started server with target: grpc://localhost:8080
2018-06-07 08:19:16.932249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-06-07 08:19:16.932327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-07 08:19:16.932356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-06-07 08:19:16.932382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-06-07 08:19:16.932515: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/device:GPU:0 with 2356 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-06-07 08:19:24.259923: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session 25988990f5aa60ac with config: gpu_options { per_process_gpu_memory_fraction: 0.3 }

Do you have any more insights that could be of help here? When executing the same code in TX2 baremetal these steps complete instantly, and my containers have access to a full CPU and 3Gi of RAM, so I don’t think resources are the actual bottleneck here?

saikishor · June 7, 2018, 8:29am

@dr3dd Try to install Tensorflow from this repo, [url]GitHub - NVIDIA-AI-IOT/tf_to_trt_image_classification: Image classification with NVIDIA TensorRT from TensorFlow models. this is not creating those freezing issues. I don’t know why this is keep on happening, but this is happening in almost all of these .whl files available in most of the repos.

dr3dd · June 7, 2018, 8:34am

The link you provided seems to provide a rather old Tensorflow 1.5 wheel file. Do you happen to know if there’s a 1.8 wheel file that doesn’t cause this lag? Or, how was that 1.5 wheel file built? I don’t mind building it myself :) Thanks!

saikishor · June 7, 2018, 8:38am

Hello!! @dr3dd,

I am not sure how they have built the wheel file, I would suggest you to build your own tensorflow wheel file using bazel build, try to look over their website under the section installing from sources.

dr3dd · June 12, 2018, 11:14am

Hi all, just an update on the topic. I compiled the wheel myself (for Tensorflow 1.8) without TensorRT support (as I was not really using it) and with GDR and VERBS support and I got rid of the delay. You can grab it from here.

All .whl files that I’ve found accross the Internet had TensorRT support enabled. Might this be the key difference and the cause of the delay? Would that make sense?

saikishor · June 12, 2018, 11:23am

Thanks for your kind update @dr3dd. I would like to know one thing regd this whl file. What is the advantage of building Tensorflow with TensorRT support?.

dr3dd · June 12, 2018, 1:18pm

@saikishor, I’m not a deep learning expert but using TensorRT for inference generates some optimizations for Nvidia GPUs. See this for more information.

saikishor · June 12, 2018, 1:22pm

@dr3dd. Thanks for the info and the link. I am looking for such information.

diniramadhani553 · July 2, 2021, 7:47am

sorry, i don’t understand. can you more explain it ??
i got this problem too, please…

Topic		Replies	Views
run tensorflow 1.3 on tx2 stuck Jetson TX2	20	5824	October 18, 2021
TensorFlow 1.11.0 wheel with JetPack 3.3 Jetson TX2	103	46704	November 13, 2019
TensorFlow on Jetson TX2 Jetson TX2	47	19996	September 18, 2017
Tensorflow Memory Error Jetson TX2	25	15603	October 18, 2021
Is Tensorflow 2.0 on Jetson TX2 supported? Jetson TX2	19	4724	October 18, 2021
Available: TensorFlow 1.5 for Jetson TX2 Jetson TX2	18	8099	May 21, 2018
TensorFlow for Jetson TX2! Jetson TX2	113	49101	September 21, 2023
Jetson Xavier NX - Tensorflow 2 container slower on GPU than on CPU Jetson Xavier NX tensorflow	5	2708	October 18, 2021
TensorFlow Issue - 'NonMaxSuppressionV3' in binary Jetson TX2	16	3421	October 18, 2021
Tensorflow 2.x on Jetson nano Jetson Nano tensorflow	23	7561	October 18, 2021

Freeze while executing Tensorflow in a Docker container on the TX2

Related topics