I am trying to convert a YOLOv4 (full yolov4, not tiny version) model in format ONNX to the tensorRT engine using trtexec. The device I am using is jetson Xavier NX.
Even with --buildOnly flag, the executive fails with code 137(out of memory). Yolov4 is not a big model and won’t take too much GPU memory. In the testing with Pytorch, It cost about 3GB and with Tensorflow the cost is almost the same.
So the problem is: 1) Why is this kind of translation costing so much memory of Jetson Devices?
2) Any alternative solution for this? Like building the model with V100/RTX2080ti and manually
selecting the compute compatibility version to 7.2 ?