Memory error OpenCV, Gstreamer, and Tensorflow memory error?

Hi, im trying to use openCV with a gstreamer pipeline to pass frames through a classifier thats been trained in Tensorflow with Keras.

Everything seems to run ok but its really grumbling about memory…does anyone have any advice here?!

$ sudo “export DISPLAY=:0” python Demo_g.py
[sudo] password for ubuntu:
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Using TensorFlow backend.

** (Demo_g.py:1446): WARNING **: Couldn’t connect to accessibility bus: Failed to connect to socket /tmp/dbus-kVb8LFTHBM: Connection refused

Available Sensor modes :
2592 x 1944 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
2592 x 1458 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
1280 x 720 FR=120.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10

NvCameraSrc: Trying To Set Default Camera Resolution. Selected 1280x720 FrameRate = 24.000000 …

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] ARM64 does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GP10B
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 6.12GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) → (device: 0, name: GP10B, pci bus id: 0000:00:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.82G (6246737920 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.24G (5622064128 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.71G (5059857408 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.24G (4553871360 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 3.82G (4098484224 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
packet_write_wait: Connection to 192.168.0.46 port 22: Broken pipe (me killing the process in a very ineloquent manner)

i have the same problem.
it seems to be happen because of the shared memory between the cpu and gpu
unfortunately there is no way to fix this problem right now
but it is not an error. just a simple message tells you memory allocation failed and try with the small amount of memory
and it always runs fine, but it might result out the inconsistent running time for the same network

swap file?
do you use swap?

Hi,

Thanks for your question.

Looks like this is an out of memory issue.

E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.82G (6246737920 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

Could you add some swap and try it again?

fallocate -l 8G swapfile
ls -lh swapfile
chmod 600 swapfile
ls -lh swapfile
mkswap swapfile
sudo swapon swapfile
swapon -s

Please also let us know the results.
Thanks.

Hi guys - done, making a swapfile has really taken me up to the hilt with storage (my dev repo, tensorflow, openCV, protobuf all taking up a fair bit of space).

Still hitting that error

$ sudo “export DISPLAY=:0” python Demo_g.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Using TensorFlow backend.

Available Sensor modes :
2592 x 1944 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
2592 x 1458 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
1280 x 720 FR=120.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10

NvCameraSrc: Trying To Set Default Camera Resolution. Selected 1280x720 FrameRate = 24.000000 …

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] ARM64 does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GP10B
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 5.71GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) → (device: 0, name: GP10B, pci bus id: 0000:00:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.41G (5811900416 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.87G (5230710272 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.38G (4707639296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 3.95G (4236875264 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

in case this helps…(i’m guessing it wont but worth mentioning anyway)

$ df -T -H
Filesystem Type Size Used Avail Use% Mounted on
/dev/mmcblk0p1 ext4 30G 29G 0 100% /
none devtmpfs 8.2G 0 8.2G 0% /dev
tmpfs tmpfs 8.3G 357k 8.3G 1% /dev/shm
tmpfs tmpfs 8.3G 14M 8.3G 1% /run
tmpfs tmpfs 5.3M 4.1k 5.3M 1% /run/lock
tmpfs tmpfs 8.3G 0 8.3G 0% /sys/fs/cgroup
tmpfs tmpfs 824M 37k 824M 1% /run/user/106
tmpfs tmpfs 824M 4.1k 824M 1% /run/user/1001
tmpfs tmpfs 824M 4.1k 824M 1% /run/user/1000

and…

$ du -sm * | sort -n
1 AUTHORS
1 BUILD
1 CHANGELOG.md
1 combine_distfiles.sh
1 CONTRIBUTING.md
1 CONTRIBUTORS
1 Desktop
1 Documents
1 Downloads
1 examples
1 installTensorFlowTX2
1 ISSUE_TEMPLATE.md
1 jetson_clocks.sh
1 LICENSE
1 LICENSE.txt
1 Music
1 Pictures
1 Public
1 README.md
1 scripts
1 tegrastats
1 Templates
1 to
1 Videos
1 WORKSPACE
2 tools
3 site
9 derived
15 ENV
27 tegra_multimedia_api
38 src
60 cudnn
61 tensorflow-1.0.1-cp27-cp27mu-linux_aarch64.whl
122 third_party
127 output
170 tensorflow
766 NVIDIA_CUDA-8.0_Samples
1212 cuda-l4t
1619 rawto
1756 protobuf-3.1.0
8193 swapfile

Hi,

Thanks for your feedback.
Based on your error log, still met out of memory issue.

Could you remove the swap file and increase it into 16G?

rm -rf swapfile
fallocate -l 16G swapfile
... ...

Please also let us know the results.
Thanks.

Ok -0 struggling to build a swap file that size so looks like that storage output above is relevant.

Any hints on freeing up more space here?? My repo is as small as it can be - i feel like there is something going on below im missing - i dont seem to have any more swap files on system but its looking like there are spaces on disk that are similar sized yet dont seem to be in use?!?!?

$ df -T -H

Filesystem Type Size Used Avail Use% Mounted on
/dev/mmcblk0p1 ext4 30G 30G 0 100% /
none devtmpfs 8.2G 0 8.2G 0% /dev
tmpfs tmpfs 8.3G 336k 8.3G 1% /dev/shm
tmpfs tmpfs 8.3G 14M 8.3G 1% /run
tmpfs tmpfs 5.3M 4.1k 5.3M 1% /run/lock
tmpfs tmpfs 8.3G 0 8.3G 0% /sys/fs/cgroup
tmpfs tmpfs 824M 13k 824M 1% /run/user/106
tmpfs tmpfs 824M 4.1k 824M 1% /run/user/1001
tmpfs tmpfs 824M 4.1k 824M 1% /run/user/1000

Plug in a fast USB thumb drive, or an external SATA SSD, and put the swap space on that device?

Will pick up a USB drive this evening and report back on progress ASAP.

ok…SO 16gb swapfile created on mounted USB 3.0

df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mmcblk0p1 28768380 19154044 8129948 71% /
none 7969692 0 7969692 0% /dev
tmpfs 8042556 348 8042208 1% /dev/shm
tmpfs 8042556 13336 8029220 1% /run
tmpfs 5120 4 5116 1% /run/lock
tmpfs 8042556 0 8042556 0% /sys/fs/cgroup
tmpfs 804256 32 804224 1% /run/user/106
tmpfs 804256 4 804252 1% /run/user/1001
tmpfs 804256 4 804252 1% /run/user/1000
/dev/sda1 60340372 16830284 40421908 30% /home/ubuntu/UsbStick

STILL hitting that error though?!?!?

$ sudo “export DISPLAY=:0” python Demo_g.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Using TensorFlow backend.

Available Sensor modes :
2592 x 1944 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
2592 x 1458 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
1280 x 720 FR=120.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10

NvCameraSrc: Trying To Set Default Camera Resolution. Selected 1280x720 FrameRate = 24.000000 …

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] ARM64 does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GP10B
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 6.07GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) → (device: 0, name: GP10B, pci bus id: 0000:00:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.76G (6190350336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.19G (5571315200 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.67G (5014183424 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.20G (4512764928 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

Is there more than one allocation happening in CUDA?
Do you know how much memory it will want overall?
Or is this a fragmentation issue? (I don’t know how good the Jetson and CUDA is about mapping allocations – it may be totally automatic, or it may be a limitation.)
I don’t think CUDA can use the virtual memory at all, it probably has to be physical.
(The swap file is still useful to push more non-CUDA allocations out.)

Also, just double-checking:
Did you actually turn on that swap file after creating it?
What does “swapon -s” show?

I thought the swap file was not working but looks like it is in retrospect…

$ sudo swapon swapfile
ubuntu@tegra-ubuntu:~/UsbStick$ swapon -s
Filename Type Size Used Priority
/home/ubuntu/UsbStick/swapfile file 16777212 0 -1
U
O.k so next thoughts - im currently using ssh -Y -C to do this as the USB stick is taking up the memory slot. Note: i did get that memory error when i ran the script locally. Im thinking next steps should be to get a powered USB hub as i wont be able to run the USB stick and keyboard to run script locally as there is only one USB port on the TX2

Do beware that when you use the ssh option “-Y” that some GPU functionality will be offloaded from the Jetson to the host PC. You might skip the “-Y” and set “export DISPLAY=:0” for display to the Jetson’s monitor.

If you want to see what goes on with memory, try running “htop” (you’d have to install it) while your script runs to see how memory changes.

There is one USB 3 and one USB 2. I put a wireless keyboard/mouse combo adapter in the USB 2 slot, and use USB 3 for high-bandwidth peripherals.
You can also use a USB hub for keyboard + mouse + whatever on the USB 2 port.
(You need to plug in the white USB adapter cable that came with the devkit to get the USB 2 port)

ok - it’s fixed!

  1. ran locally - less errors than via ssh -Y -C…

i did actually know it would bottle neck the settings to my puny macbook air but wanted to make sure the swap memory worked etc first before taking the plunge on a usb hub however as @snarky kindly pointed out i could use the USB2 slot and adapter (forgot about this!!!)

  1. still saw some memory issues when running locally from jetson. Realised i might have not turned all cores on! called sudo nvpmodel -m 2 to turn on and ran again - ran without errors!

  2. i had not ran the ./jetson_clocks.sh script to boost power before so i’ve ran this now to give it a bit more of a kick. The fan is pretty loud lol but its good.

It seems to be running much nicer / smoother now. Pretty pleased with results. If there are any other optimisations / things i could check please let me know?