Memory error OpenCV, Gstreamer, and Tensorflow memory error?

yes_yes_yall · May 17, 2017, 8:22pm

Hi, im trying to use openCV with a gstreamer pipeline to pass frames through a classifier thats been trained in Tensorflow with Keras.

Everything seems to run ok but its really grumbling about memory…does anyone have any advice here?!

$ sudo “export DISPLAY=:0” python Demo_g.py
[sudo] password for ubuntu:
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Using TensorFlow backend.

** (Demo_g.py:1446): WARNING **: Couldn’t connect to accessibility bus: Failed to connect to socket /tmp/dbus-kVb8LFTHBM: Connection refused

Available Sensor modes :
2592 x 1944 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
2592 x 1458 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
1280 x 720 FR=120.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10

NvCameraSrc: Trying To Set Default Camera Resolution. Selected 1280x720 FrameRate = 24.000000 …

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] ARM64 does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GP10B
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 6.12GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) → (device: 0, name: GP10B, pci bus id: 0000:00:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.82G (6246737920 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.24G (5622064128 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.71G (5059857408 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.24G (4553871360 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 3.82G (4098484224 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
packet_write_wait: Connection to 192.168.0.46 port 22: Broken pipe (me killing the process in a very ineloquent manner)

gyuhyong · May 17, 2017, 9:02pm

i have the same problem.
it seems to be happen because of the shared memory between the cpu and gpu
unfortunately there is no way to fix this problem right now
but it is not an error. just a simple message tells you memory allocation failed and try with the small amount of memory
and it always runs fine, but it might result out the inconsistent running time for the same network

Andrey1984 · May 18, 2017, 12:54am

swap file?
do you use swap?

AastaLLL · May 18, 2017, 2:42am

Hi,

Thanks for your question.

Looks like this is an out of memory issue.

E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.82G (6246737920 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

Could you add some swap and try it again?

fallocate -l 8G swapfile
ls -lh swapfile
chmod 600 swapfile
ls -lh swapfile
mkswap swapfile
sudo swapon swapfile
swapon -s

Please also let us know the results.
Thanks.

yes_yes_yall · May 18, 2017, 8:59am

Hi guys - done, making a swapfile has really taken me up to the hilt with storage (my dev repo, tensorflow, openCV, protobuf all taking up a fair bit of space).

Still hitting that error

$ sudo “export DISPLAY=:0” python Demo_g.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Using TensorFlow backend.

Available Sensor modes :
2592 x 1944 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
2592 x 1458 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
1280 x 720 FR=120.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10

NvCameraSrc: Trying To Set Default Camera Resolution. Selected 1280x720 FrameRate = 24.000000 …

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] ARM64 does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GP10B
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 5.71GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) → (device: 0, name: GP10B, pci bus id: 0000:00:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.41G (5811900416 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.87G (5230710272 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.38G (4707639296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 3.95G (4236875264 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

in case this helps…(i’m guessing it wont but worth mentioning anyway)

$ df -T -H
Filesystem Type Size Used Avail Use% Mounted on
/dev/mmcblk0p1 ext4 30G 29G 0 100% /
none devtmpfs 8.2G 0 8.2G 0% /dev
tmpfs tmpfs 8.3G 357k 8.3G 1% /dev/shm
tmpfs tmpfs 8.3G 14M 8.3G 1% /run
tmpfs tmpfs 5.3M 4.1k 5.3M 1% /run/lock
tmpfs tmpfs 8.3G 0 8.3G 0% /sys/fs/cgroup
tmpfs tmpfs 824M 37k 824M 1% /run/user/106
tmpfs tmpfs 824M 4.1k 824M 1% /run/user/1001
tmpfs tmpfs 824M 4.1k 824M 1% /run/user/1000

and…

$ du -sm * | sort -n
1 AUTHORS
1 BUILD
1 CHANGELOG.md
1 combine_distfiles.sh
1 CONTRIBUTING.md
1 CONTRIBUTORS
1 Desktop
1 Documents
1 Downloads
1 examples
1 installTensorFlowTX2
1 ISSUE_TEMPLATE.md
1 jetson_clocks.sh
1 LICENSE
1 LICENSE.txt
1 Music
1 Pictures
1 Public
1 README.md
1 scripts
1 tegrastats
1 Templates
1 to
1 Videos
1 WORKSPACE
2 tools
3 site
9 derived
15 ENV
27 tegra_multimedia_api
38 src
60 cudnn
61 tensorflow-1.0.1-cp27-cp27mu-linux_aarch64.whl
122 third_party
127 output
170 tensorflow
766 NVIDIA_CUDA-8.0_Samples
1212 cuda-l4t
1619 rawto
1756 protobuf-3.1.0
8193 swapfile

AastaLLL · May 18, 2017, 10:03am

Hi,

Thanks for your feedback.
Based on your error log, still met out of memory issue.

Could you remove the swap file and increase it into 16G?

rm -rf swapfile
fallocate -l 16G swapfile
... ...

Please also let us know the results.
Thanks.

yes_yes_yall · May 18, 2017, 1:36pm

Ok -0 struggling to build a swap file that size so looks like that storage output above is relevant.

Any hints on freeing up more space here?? My repo is as small as it can be - i feel like there is something going on below im missing - i dont seem to have any more swap files on system but its looking like there are spaces on disk that are similar sized yet dont seem to be in use?!?!?

$ df -T -H

Filesystem Type Size Used Avail Use% Mounted on
/dev/mmcblk0p1 ext4 30G 30G 0 100% /
none devtmpfs 8.2G 0 8.2G 0% /dev
tmpfs tmpfs 8.3G 336k 8.3G 1% /dev/shm
tmpfs tmpfs 8.3G 14M 8.3G 1% /run
tmpfs tmpfs 5.3M 4.1k 5.3M 1% /run/lock
tmpfs tmpfs 8.3G 0 8.3G 0% /sys/fs/cgroup
tmpfs tmpfs 824M 13k 824M 1% /run/user/106
tmpfs tmpfs 824M 4.1k 824M 1% /run/user/1001
tmpfs tmpfs 824M 4.1k 824M 1% /run/user/1000

snarky · May 18, 2017, 3:08pm

Plug in a fast USB thumb drive, or an external SATA SSD, and put the swap space on that device?

yes_yes_yall · May 18, 2017, 4:15pm

Will pick up a USB drive this evening and report back on progress ASAP.

yes_yes_yall · May 19, 2017, 1:35pm

ok…SO 16gb swapfile created on mounted USB 3.0

df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mmcblk0p1 28768380 19154044 8129948 71% /
none 7969692 0 7969692 0% /dev
tmpfs 8042556 348 8042208 1% /dev/shm
tmpfs 8042556 13336 8029220 1% /run
tmpfs 5120 4 5116 1% /run/lock
tmpfs 8042556 0 8042556 0% /sys/fs/cgroup
tmpfs 804256 32 804224 1% /run/user/106
tmpfs 804256 4 804252 1% /run/user/1001
tmpfs 804256 4 804252 1% /run/user/1000
/dev/sda1 60340372 16830284 40421908 30% /home/ubuntu/UsbStick

STILL hitting that error though?!?!?

$ sudo “export DISPLAY=:0” python Demo_g.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Using TensorFlow backend.

Available Sensor modes :
2592 x 1944 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
2592 x 1458 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
1280 x 720 FR=120.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10

NvCameraSrc: Trying To Set Default Camera Resolution. Selected 1280x720 FrameRate = 24.000000 …

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] ARM64 does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GP10B
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 6.07GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) → (device: 0, name: GP10B, pci bus id: 0000:00:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.76G (6190350336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.19G (5571315200 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.67G (5014183424 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.20G (4512764928 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

snarky · May 19, 2017, 3:23pm

Is there more than one allocation happening in CUDA?
Do you know how much memory it will want overall?
Or is this a fragmentation issue? (I don’t know how good the Jetson and CUDA is about mapping allocations – it may be totally automatic, or it may be a limitation.)
I don’t think CUDA can use the virtual memory at all, it probably has to be physical.
(The swap file is still useful to push more non-CUDA allocations out.)

Also, just double-checking:
Did you actually turn on that swap file after creating it?
What does “swapon -s” show?

yes_yes_yall · May 19, 2017, 3:35pm

I thought the swap file was not working but looks like it is in retrospect…

$ sudo swapon swapfile
ubuntu@tegra-ubuntu:~/UsbStick$ swapon -s
Filename Type Size Used Priority
/home/ubuntu/UsbStick/swapfile file 16777212 0 -1
U
O.k so next thoughts - im currently using ssh -Y -C to do this as the USB stick is taking up the memory slot. Note: i did get that memory error when i ran the script locally. Im thinking next steps should be to get a powered USB hub as i wont be able to run the USB stick and keyboard to run script locally as there is only one USB port on the TX2

linuxdev · May 19, 2017, 6:39pm

Do beware that when you use the ssh option “-Y” that some GPU functionality will be offloaded from the Jetson to the host PC. You might skip the “-Y” and set “export DISPLAY=:0” for display to the Jetson’s monitor.

If you want to see what goes on with memory, try running “htop” (you’d have to install it) while your script runs to see how memory changes.

snarky · May 20, 2017, 4:45am

There is one USB 3 and one USB 2. I put a wireless keyboard/mouse combo adapter in the USB 2 slot, and use USB 3 for high-bandwidth peripherals.
You can also use a USB hub for keyboard + mouse + whatever on the USB 2 port.
(You need to plug in the white USB adapter cable that came with the devkit to get the USB 2 port)

yes_yes_yall · May 20, 2017, 8:07am

ok - it’s fixed!

ran locally - less errors than via ssh -Y -C…

i did actually know it would bottle neck the settings to my puny macbook air but wanted to make sure the swap memory worked etc first before taking the plunge on a usb hub however as @snarky kindly pointed out i could use the USB2 slot and adapter (forgot about this!!!)

still saw some memory issues when running locally from jetson. Realised i might have not turned all cores on! called sudo nvpmodel -m 2 to turn on and ran again - ran without errors!
i had not ran the ./jetson_clocks.sh script to boost power before so i’ve ran this now to give it a bit more of a kick. The fan is pretty loud lol but its good.

It seems to be running much nicer / smoother now. Pretty pleased with results. If there are any other optimisations / things i could check please let me know?

Topic		Replies	Views
GPU out of memory when the total ram usage is 2.8G Jetson TX2	27	19075	February 15, 2019
Problems running tensorflow + Keras Jetson Nano	4	729	December 2, 2019
General Question about Jetsons GPU/CPU Shared Memory Usage Jetson TX2	34	8005	July 4, 2019
Memory for GPU so small? Jetson TX2	13	4813	June 27, 2017
GPU: Not enough priv cmd buffer space Jetson TX1	22	2404	January 19, 2017
object detection failed to run on TX2, based on tensorflow/modesl Jetson TX2	13	2267	December 28, 2017
failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED Jetson TX2	10	1378	March 1, 2018
when i run a tensorflow model there is not enough memory ,what shoud i do Jetson TX2	11	8147	September 1, 2021
SSD: functioned well on CPU but failed on GPU Jetson TX2	6	996	June 18, 2019
Jetson Xavier NX 8GB and unified RAM management Jetson AGX Xavier cuda , pytorch	5	87	May 14, 2026

Memory error OpenCV, Gstreamer, and Tensorflow memory error?

Related topics