Duplication in the pooling layer on NVDLA

I’m trying to run YOLOv1 on NVDLA (Platform: Zynq UltraScale+ MPSoC ZCU102) and getting this weird problem: the layers at the beginning (conv1, bn1, scale1, relu1) are all correctly executed, but after the first pooling layer “pool1”, there is somehow a duplication in the feature map. I found this problem by visualizing the feature maps in SW and on NVDLA:
pool1 caffe
pool1 nvdla
You can see compared to the Caffe implementation in SW which is correct, every feature map in the pooling layer has a duplication on NVDLA. The duplication starts from around 3/5 of the width horizontally to the right and when it reaches the right boundary, it continues from the left. Since the execution is correct in Caffe and the correctness is also validated in the TensorRT framework after int8 quantization, this seems to be a problem related to either the compilation of the NVDLA loadable or the HW configuration of NVDLA. By the way, I am using the default nv_small configuration and was able to run ResNet20 on it. Does anyone experienced with NVDLA know what could be the problem? Thanks a lot!

UPDATE: So I just tested out a model with only a pooling layer and found out that the pooling engine can work on 224x224 images (Max pooling, stride 2) , but in my case the image resolution is 448x448 and it doesn’t work. Does someone know why is that?