Big batch_size parameter leads to lower performance

I use deepstream-app to run yolov3 sample. Fix only one source. When batch_size is set to 1, the performance is 60fps, However, when batch_size is set to 32, the performance is 40 fps. Is there any way to improve performance at batch_size 32?

