I am trying to profile YOLOv8n through NCU and do not find any kernel for batch normalization. Upon confirming I see in literature and even when printing the YOLO model that it contains a couple of BatchNorm2d layers.
The batch normalization layers get fused with convolution during export in Ultralytics.
You can get manually do that by running model.fuse()
in Ultralytics.
Is there some way to preserve the batch normalization layers when exporting to ONNX format?