Hello,
I recently trained a resnet101 on tensorflow 1.11.0 through a tesla K80 GPU and installed on tx2 TF 1.11.0 via the whl file (tensorflow-1.11.0-cp35-cp35m-linux_aarch64.whl) supplied by Nvidia. However, when I try to run inferences on this network I always catch this exception (process killed) : failed to create cublas handle…
Pls find below the session init (load frozen model + inference) :
-- Load model --
2019-04-08 14:19:28.388289: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:931] ARM64 does not support NUMA - returning NUMA node zero
2019-04-08 14:19:28.388454: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 3.02GiB
2019-04-08 14:19:28.388508: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0
2019-04-08 14:19:29.699067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-08 14:19:29.699176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0
2019-04-08 14:19:29.699204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N
2019-04-08 14:19:29.699401: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3141 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
-- Inference --
2019-04-08 14:19:48.314675: W tensorflow/core/framework/allocator.cc:113] Allocation of 18874368 exceeds 10% of system memory.
2019-04-08 14:19:48.332104: W tensorflow/core/framework/allocator.cc:113] Allocation of 8388608 exceeds 10% of system memory.
2019-04-08 14:19:48.335623: W tensorflow/core/framework/allocator.cc:113] Allocation of 9437184 exceeds 10% of system memory.
2019-04-08 14:19:48.338348: W tensorflow/core/framework/allocator.cc:113] Allocation of 4194304 exceeds 10% of system memory.
2019-04-08 14:19:48.340557: W tensorflow/core/framework/allocator.cc:113] Allocation of 4194304 exceeds 10% of system memory.
2019-04-08 14:19:58.966366: E tensorflow/stream_executor/cuda/cuda_blas.cc:464] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-04-08 14:19:59.080731: E tensorflow/stream_executor/cuda/cuda_blas.cc:464] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-04-08 14:19:59.516636: E tensorflow/stream_executor/cuda/cuda_dnn.cc:353] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
I installed the whl file as follow :
sudo -H pip3 install tensorflow-1.11.0-cp35-cp35m-linux_aarch64.whl
I checked the version to be sure :
nvidia@tegra-ubuntu:~$ python3 -c 'import tensorflow as tf; print(tf.__version__)'
1.11.0
In python I settled the fraction of gpu memory to .4 (but tried to .8 w/ same behavior)
self.config = tf.ConfigProto();
self.config.allow_soft_placement = True
self.config.gpu_options.per_process_gpu_memory_fraction = 0.4
And initialized the session as follow :
with tf.device("/gpu:0"):
self.detection_graph = tf.Graph()
with tf.Session(config=self.config) as sess:
pls note I’ve also tried the other method through
allow_growth=True
with a corresponding swap file, but even this method does not work. I confirm the memory is growing and swap file as well.
I read somewhere that tf 1.11.0 expect cudnn 7.2, however, on my system the cudnn version is 7.1.5
nvidia@tegra-ubuntu:~$ dpkg -l | grep cuda
ii cuda-command-line-tools-9-0 9.0.252-1 arm64 CUDA command-line tools
ii cuda-core-9-0 9.0.252-1 arm64 CUDA core tools
ii cuda-cublas-9-0 9.0.252-1 arm64 CUBLAS native runtime libraries
ii cuda-cublas-dev-9-0 9.0.252-1 arm64 CUBLAS native dev links, headers
ii cuda-cudart-9-0 9.0.252-1 arm64 CUDA Runtime native Libraries
ii cuda-cudart-dev-9-0 9.0.252-1 arm64 CUDA Runtime native dev links, headers
ii cuda-cufft-9-0 9.0.252-1 arm64 CUFFT native runtime libraries
ii cuda-cufft-dev-9-0 9.0.252-1 arm64 CUFFT native dev links, headers
ii cuda-curand-9-0 9.0.252-1 arm64 CURAND native runtime libraries
ii cuda-curand-dev-9-0 9.0.252-1 arm64 CURAND native dev links, headers
ii cuda-cusolver-9-0 9.0.252-1 arm64 CUDA solver native runtime libraries
ii cuda-cusolver-dev-9-0 9.0.252-1 arm64 CUDA solver native dev links, headers
ii cuda-cusparse-9-0 9.0.252-1 arm64 CUSPARSE native runtime libraries
ii cuda-cusparse-dev-9-0 9.0.252-1 arm64 CUSPARSE native dev links, headers
ii cuda-documentation-9-0 9.0.252-1 arm64 CUDA documentation
ii cuda-driver-dev-9-0 9.0.252-1 arm64 CUDA Driver native dev stub library
ii cuda-libraries-dev-9-0 9.0.252-1 arm64 CUDA Libraries 9.0 development meta-package
ii cuda-license-9-0 9.0.252-1 arm64 CUDA licenses
ii cuda-misc-headers-9-0 9.0.252-1 arm64 CUDA miscellaneous headers
ii cuda-npp-9-0 9.0.252-1 arm64 NPP native runtime libraries
ii cuda-npp-dev-9-0 9.0.252-1 arm64 NPP native dev links, headers
ii cuda-nvgraph-9-0 9.0.252-1 arm64 NVGRAPH native runtime libraries
ii cuda-nvgraph-dev-9-0 9.0.252-1 arm64 NVGRAPH native dev links, headers
ii cuda-nvml-dev-9-0 9.0.252-1 arm64 NVML native dev links, headers
ii cuda-nvrtc-9-0 9.0.252-1 arm64 NVRTC native runtime libraries
ii cuda-nvrtc-dev-9-0 9.0.252-1 arm64 NVRTC native dev links, headers
ii cuda-repo-l4t-9-0-local 9.0.252-1 arm64 cuda repository configuration files
ii cuda-samples-9-0 9.0.252-1 arm64 CUDA example applications
ii cuda-toolkit-9-0 9.0.252-1 arm64 CUDA Toolkit 9.0 meta-package
ii libcudnn7 7.1.5.14-1+cuda9.0 arm64 cuDNN runtime libraries
ii libcudnn7-dev 7.1.5.14-1+cuda9.0 arm64 cuDNN development libraries and headers
ii libcudnn7-doc 7.1.5.14-1+cuda9.0 arm64 cuDNN documents and samples
ii libgie-dev 4.1.3-1+cuda9.0 arm64 Transitional package
ii libnvinfer-dev 4.1.3-1+cuda9.0 arm64 TensorRT development libraries and headers
ii libnvinfer-samples 4.1.3-1+cuda9.0 arm64 TensorRT samples and documentation
ii libnvinfer4 4.1.3-1+cuda9.0 arm64 TensorRT runtime libraries
ii tensorrt 4.0.2.0-1+cuda9.0 arm64 Meta package of TensorRT
Is someone from Nvidia could help on this ?