Failed to allocate cuda output buffer during context initialization

I was able to resolve this by adding an extra dimension to the scores, boxes, and labels in pytorch.