Hello. I am a beginner of deep learning. I have a question.
I wanna use Keras-OCR on nVIDIA Jetson Nano. But program doesn’t want ran.
It says: GpuLaunchKernel( params... ) status: Internal: too many resources requested for launch
But on desktop. The example program run perfectly.
I doesn’t find anyone talk about this problem. I even can’t recognize the problem is cause by Tensorflow or CUDA or something else.
Thank you.
My environment:
- Desktop
CPU: Intel Core i7 2600
GPU: nVIDIA GeForce GTX 1060 6GB
OS: Kubuntu 20.04 LTS
RAM: 10GB DDR3 @ 1333MHz
Python version: 3.8.2
CUDA version: 10.1
Tensorflow version: 2.3.0
cuDNN version: 7.6.5 - Jetson Nano
VRAM: Approx. 3.2GB available for GPU
OS: Jetpack 4.4
Python version: 3.6.9
CUDA version: 10.2
Tensorflow version: 2.2.0 + nv20.8
cuDNN version: 8.0.0 - I Have 12GB swap space.
- My input picture resolution is 288 x 162 px. Picture size is around 12kB ~ 20kB
(OOM will occur first if the input picture is too big.)
Terminal log:
(kerasOCR) jetson@jetson-desktop:~/kerasOCR/workspace/k_ocr_clone$ python3 camera.py
2020-09-06 22:57:10.655089: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
Looking for /home/jetson/.keras-ocr/craft_mlt_25k.h5
2020-09-06 22:57:19.221045: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-09-06 22:57:19.227423: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-06 22:57:19.227589: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 194.55MiB/s
2020-09-06 22:57:19.227751: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-09-06 22:57:19.236155: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-06 22:57:19.240630: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-06 22:57:19.244814: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-06 22:57:19.254348: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-06 22:57:19.260853: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-09-06 22:57:19.262341: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-09-06 22:57:19.262843: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-06 22:57:19.263260: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-06 22:57:19.263343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-09-06 22:57:19.288305: W tensorflow/core/platform/profile_utils/cpu_utils.cc:106] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2020-09-06 22:57:19.289006: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x25b5a3d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-06 22:57:19.289077: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-09-06 22:57:19.369517: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-06 22:57:19.369827: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x25c99bb0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-06 22:57:19.369914: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2020-09-06 22:57:19.370629: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-06 22:57:19.370770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 194.55MiB/s
2020-09-06 22:57:19.370997: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-09-06 22:57:19.371204: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-06 22:57:19.371356: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-06 22:57:19.371478: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-06 22:57:19.371587: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-06 22:57:19.371695: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-09-06 22:57:19.371809: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-09-06 22:57:19.372233: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-06 22:57:19.372753: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-06 22:57:19.372871: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-09-06 22:57:19.373081: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-09-06 22:57:21.333633: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-06 22:57:21.333721: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-09-06 22:57:21.333763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-09-06 22:57:21.334495: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-06 22:57:21.335058: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-06 22:57:21.335266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 707 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
WARNING:tensorflow:From /home/jetson/kerasOCR/lib/python3.6/site-packages/tensorflow/python/keras/backend.py:5871: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
Looking for /home/jetson/.keras-ocr/crnn_kurapan.h5
[ WARN:0] global /tmp/pip-install-0l7q4gf0/opencv-python/opencv/modules/videoio/src/cap_gstreamer.cpp (935) open OpenCV | GStreamer warning: Cannot query video position: status=0, value=-1, duration=-1
2020-09-06 22:57:47.804616: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-09-06 22:57:51.179076: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.69GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-09-06 22:57:51.182113: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-06 22:57:52.769728: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 836.12MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-09-06 22:57:53.799806: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.22GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-09-06 22:57:54.980083: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.23GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-09-06 22:57:55.605321: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 426.06MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-09-06 22:57:56.530118: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.64GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-09-06 22:57:57.880982: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 658.38MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-09-06 22:58:00.158814: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 875.50MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-09-06 22:58:00.792854: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 572.75MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-09-06 22:58:01.852528: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.09GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-09-06 22:58:04.101235: F tensorflow/core/kernels/resize_bilinear_op_gpu.cu.cc:493] Non-OK-status: GpuLaunchKernel(kernel, config.block_count, config.thread_per_block, 0, d.stream(), config.virtual_thread_count, images.data(), height_scale, width_scale, batch, in_height, in_width, channels, out_height, out_width, output.data()) status: Internal: too many resources requested for launch
Aborted (core dumped)