Extending gpu_packet_processing application

I have extended the gpu_packet_processing application example to detect and count VITA packet headers (encapsulated in UDP payloads).
However, the performance is much worse, leading to around 50% packet loss. Any suggestions? I suspect the issue is with the DOCA buffer array being overwritten.