Memory allocation problem

Hello,
Posting here because I want to moove myu model on jetson nx.
When I am training, my training get an allocation error when validation is happening.
Logs : > Producing predictions: 0%| | 9/1999 [00:12<44:31, 1.34s/it]2020-09-18 13:14:42.816672: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 34224472064 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory

2020-09-18 13:14:42.816887: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 34224472064
2020-09-18 13:14:42.823546: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 30802024448 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-09-18 13:14:42.823646: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 30802024448
/usr/local/bin/tlt-train: line 32: 422 Killed tlt-train-g1 ${PYTHON_ARGS[*]}

I do not understand what the problem is, I am training with a Titan RTX, batch size 8 (because with 16 it broke tout, even if memory is less than 50% active).
Can you tell me what’s wrong here?

try using swap

Can you elaborate more on it?

ref: https://www.jetsonhacks.com/2019/11/28/jetson-nano-even-more-swap/

Still on RTX for training right now. Still relevant?

for RTX you may post to a separate forum section as it is not jetson related;
However SWAP can be used to kind of increase RAM - that sorts out in certain cases memory limitations;
how many GB of memory do you have?

24220MiB. It seems weird that the validation set try to take 30gb in one time, with batch 8.

I did not precise, but I am using Transfert learning toolkit for training.