tensorflow 'killed' error on the TX1

i use tensorflow v2.7, kernel-4.4.38, CUDA v8.0.72 and L4T R28.1 on the TX1.
when i use tensorflow to train model repeatedly, occur ‘killd’ message after one normal training.
this error occurs during the model builing phase.
so, after rebooting TX1, i can train model without any problem.
all the time, i need to reboot TX1 every model training
i force to refresh cache memory and swap memory directly. but it still doesn’t work normally.

in other words, i can train model with tensorflow only once without any error. after training, if i want to train again, i need to reboot TX1.
why occurs this 'killed’error?

You probably ran out of memory. Keep in mind that most RAM use of the GPU in this case has to be contiguous physical memory and not virtual memory. You might for example run “htop” while doing this and watch ram useage over time.


Memory should be released after processes died or closed.
Not sure if there is any resource release issue on TensorFlow.
You can get more information from their GitHub.

An initial suggestion is to manually close session for a test.