Hi, im trying to use openCV with a gstreamer pipeline to pass frames through a classifier thats been trained in Tensorflow with Keras.
Everything seems to run ok but its really grumbling about memory…does anyone have any advice here?!
$ sudo “export DISPLAY=:0” python Demo_g.py
[sudo] password for ubuntu:
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Using TensorFlow backend.
** (Demo_g.py:1446): WARNING **: Couldn’t connect to accessibility bus: Failed to connect to socket /tmp/dbus-kVb8LFTHBM: Connection refused
Available Sensor modes :
2592 x 1944 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
2592 x 1458 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
1280 x 720 FR=120.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
NvCameraSrc: Trying To Set Default Camera Resolution. Selected 1280x720 FrameRate = 24.000000 …
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] ARM64 does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GP10B
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 6.12GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) → (device: 0, name: GP10B, pci bus id: 0000:00:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.82G (6246737920 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.24G (5622064128 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.71G (5059857408 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.24G (4553871360 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 3.82G (4098484224 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
packet_write_wait: Connection to 192.168.0.46 port 22: Broken pipe (me killing the process in a very ineloquent manner)
i have the same problem.
it seems to be happen because of the shared memory between the cpu and gpu
unfortunately there is no way to fix this problem right now
but it is not an error. just a simple message tells you memory allocation failed and try with the small amount of memory
and it always runs fine, but it might result out the inconsistent running time for the same network
Hi guys - done, making a swapfile has really taken me up to the hilt with storage (my dev repo, tensorflow, openCV, protobuf all taking up a fair bit of space).
Still hitting that error
$ sudo “export DISPLAY=:0” python Demo_g.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Using TensorFlow backend.
Available Sensor modes :
2592 x 1944 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
2592 x 1458 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
1280 x 720 FR=120.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
NvCameraSrc: Trying To Set Default Camera Resolution. Selected 1280x720 FrameRate = 24.000000 …
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] ARM64 does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GP10B
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 5.71GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) → (device: 0, name: GP10B, pci bus id: 0000:00:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.41G (5811900416 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.87G (5230710272 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.38G (4707639296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 3.95G (4236875264 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
in case this helps…(i’m guessing it wont but worth mentioning anyway)
Ok -0 struggling to build a swap file that size so looks like that storage output above is relevant.
Any hints on freeing up more space here?? My repo is as small as it can be - i feel like there is something going on below im missing - i dont seem to have any more swap files on system but its looking like there are spaces on disk that are similar sized yet dont seem to be in use?!?!?
$ sudo “export DISPLAY=:0” python Demo_g.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Using TensorFlow backend.
Available Sensor modes :
2592 x 1944 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
2592 x 1458 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
1280 x 720 FR=120.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
NvCameraSrc: Trying To Set Default Camera Resolution. Selected 1280x720 FrameRate = 24.000000 …
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] ARM64 does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GP10B
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 6.07GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) → (device: 0, name: GP10B, pci bus id: 0000:00:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.76G (6190350336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.19G (5571315200 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.67G (5014183424 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.20G (4512764928 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Is there more than one allocation happening in CUDA?
Do you know how much memory it will want overall?
Or is this a fragmentation issue? (I don’t know how good the Jetson and CUDA is about mapping allocations – it may be totally automatic, or it may be a limitation.)
I don’t think CUDA can use the virtual memory at all, it probably has to be physical.
(The swap file is still useful to push more non-CUDA allocations out.)
Also, just double-checking:
Did you actually turn on that swap file after creating it?
What does “swapon -s” show?
I thought the swap file was not working but looks like it is in retrospect…
$ sudo swapon swapfile
ubuntu@tegra-ubuntu:~/UsbStick$ swapon -s
Filename Type Size Used Priority
/home/ubuntu/UsbStick/swapfile file 16777212 0 -1
U
O.k so next thoughts - im currently using ssh -Y -C to do this as the USB stick is taking up the memory slot. Note: i did get that memory error when i ran the script locally. Im thinking next steps should be to get a powered USB hub as i wont be able to run the USB stick and keyboard to run script locally as there is only one USB port on the TX2
Do beware that when you use the ssh option “-Y” that some GPU functionality will be offloaded from the Jetson to the host PC. You might skip the “-Y” and set “export DISPLAY=:0” for display to the Jetson’s monitor.
If you want to see what goes on with memory, try running “htop” (you’d have to install it) while your script runs to see how memory changes.
There is one USB 3 and one USB 2. I put a wireless keyboard/mouse combo adapter in the USB 2 slot, and use USB 3 for high-bandwidth peripherals.
You can also use a USB hub for keyboard + mouse + whatever on the USB 2 port.
(You need to plug in the white USB adapter cable that came with the devkit to get the USB 2 port)
i did actually know it would bottle neck the settings to my puny macbook air but wanted to make sure the swap memory worked etc first before taking the plunge on a usb hub however as @snarky kindly pointed out i could use the USB2 slot and adapter (forgot about this!!!)
still saw some memory issues when running locally from jetson. Realised i might have not turned all cores on! called sudo nvpmodel -m 2 to turn on and ran again - ran without errors!
i had not ran the ./jetson_clocks.sh script to boost power before so i’ve ran this now to give it a bit more of a kick. The fan is pretty loud lol but its good.
It seems to be running much nicer / smoother now. Pretty pleased with results. If there are any other optimisations / things i could check please let me know?