when i run a segmentation model with tensorflow on jetson tx2, a error occur: there is not enough memory to keep running ,what shoud i do
Depends on the required memory is for GPU or CPU.
For TX2, GPU allocatable memory limit to 8G.
But if the required memory is for CPU, you can try to add some swap memory.
# Create a swapfile for Ubuntu at the current directory location fallocate -l 8G swapfile # List out the file ls -lh swapfile # Change permissions so that only root can use it chmod 600 swapfile # List out the file ls -lh swapfile # Set up the Linux swap area mkswap swapfile # Now start using the swapfile sudo swapon swapfile # Show that it's now being used swapon -s
Which model is it throwing this on? I’ve been doing research on TX2’s and Tensorflow, and am hearing murmurs that some models play well and others don’t, but I don’t know what the root of that (if it’s valid) is yet.
I assume you’re talking just GPU as well. Trying to run Tensorflow on a swap file is… a choice… that I wouldn’t ever recommend.
Thanks for your reply.
If all the network is allocated on GPU, swap is no help. Tx2 GPU memory is limited to 8G.
But if you have checked the TensorFlow model placement, sometimes serval layers are put on CPU although you are in GPU mode.
In this case, swap may help.(But still network-dependent)
That’s why we always recommend giving swap a try.
I would like to add to AastaLLL’s answer:
When memory is not enough and you don’t have a swap: Tensorflow will exit with “Killed” message
When swap is on, you should also add this to /etc/sysctl.conf: vm.min_free_kbytes=65536 (to keep free at less 6% of total Memory/ number of cores)
The reason for that is sometimes when memory is almost at full but the system still try to load the swap into the memory and cause a hang/freeze
Thanks for the sharing, a_phan.
Using this method, I can optimize YOLOv3 frozen graph (https://github.com/ardianumam/Tensorflow-TensorRT) in Jetson TX2 to TensorRT graph. Without making swap memory, TX2 will run out memory only to restore YOLOv3 graph. But, the TRT-graph result needs long time (~15 minutes) only to read/load/restore it, while the original tensorflow frozen model of the YOLOv3, it needs only 5 seconds to restore the graph. Anything we can do to solve this issue?
TensorRT will compile a model into TensorRT PLAN before launching.
This will takes some time since the optimization is pretty complicated.
You can serialize a TenosrRT PLAN and launch it without re-compiling next time.
Check this document for more information:
If your code stuck on ParseFromString() then you’re most likely hit by the slowness of the protobuf python backend. Check
and start with
before running your code.
Thanks for sharing.