Source code is here.
As on the title, I’ve implemented two optimizations to reduce memory usage on both inference and training, allowing detection and training on higher resolutions.
The link above also contains details of the implementation and reports (mAP, memory usage, training time, comparisons with FP32, etc). (詳細)
Currently, I have tested YOLOv3 on Jetson Nano only.
The Mixed Precision implementation also implements a loss scaling dedicated to YOLOv3.
Although implementing Mixed Precision, since Nano does not implement TensorCores, the processing is slower than FP32.
However, since it uses less memory, if a Nano is all you have, you can still train higher resolutions.