Hello,
Posting here because I want to moove myu model on jetson nx.
When I am training, my training get an allocation error when validation is happening.
Logs : > Producing predictions: 0%| | 9/1999 [00:12<44:31, 1.34s/it]2020-09-18 13:14:42.816672: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 34224472064 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-09-18 13:14:42.816887: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 34224472064
2020-09-18 13:14:42.823546: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 30802024448 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-09-18 13:14:42.823646: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 30802024448
/usr/local/bin/tlt-train: line 32: 422 Killed tlt-train-g1 ${PYTHON_ARGS[*]}
I do not understand what the problem is, I am training with a Titan RTX, batch size 8 (because with 16 it broke tout, even if memory is less than 50% active).
Can you tell me what’s wrong here?