UFF SSD accuracy on big images

Hello. I have converted the SSD Inception v2 network using the C++ example and now I am testing the network to see how it performs. And the results are worse compared to a plain Tensorflow model.

My preprocessing steps:

  1. Cut image into slices with some overlap
  2. Normalize the colors to [-1;1]
  3. Compile a batch by concatenating all images into one huge vector

I have also tried adding some blur since the images I am testing on are a bit compressed and the borders are too angled (has a stairwell), but that did not help much.

I am mainly interested in person identification. And the converted network does a bad job at detecting small people. Any idea why?

Small update. Actually, it works bad on all images. Even with a lot of people on the image it only finds one or none of them at all. People take up most of the frame. What the reason might be?


Is it possible for you to isolate a minimal repro for where TRT is producing seriously different results from TF?