Speed of GIE on TX1 is unexpected.

I am running Inception-BN on ImageNet using GIE, and comparing its speed to running the original model using MXNet. Then some strange things happened.

On my desktop with a GTX 1080, the result is promising: ~8ms vs ~3.5ms, which is also stable, no matter I ran 1 image or 1000 images.

However, when running on TX1, there are three peculiar phenomena:

  • The speeds of both GIE and MXNet are not stable, and are ranged from ~30ms to ~200ms.
  • When running only 1 image, GIE is typically much faster than MXNet. However, when running the whole dataset, say 50K images, the average speed of GIE is always (much) slower than MXNet.
  • When running GIE on TX1, fp16 is slower than fp32, which is confirmed by the official sample:
    • fp16
    • ./bin/giexec --model=data/samples/mnist/mnist.caffemodel --deploy=data/samples/mnist/mnist.prototxt --output=prob --half2 --batch=2 model: data/samples/mnist/mnist.caffemodel deploy: data/samples/mnist/mnist.prototxt output: prob half2 batch: 2 Average over 10 runs is 0.746263 ms. Average over 10 runs is 0.719342 ms. Average over 10 runs is 0.755184 ms. Average over 10 runs is 0.785189 ms. Average over 10 runs is 0.87802 ms. Average over 10 runs is 0.637726 ms. Average over 10 runs is 0.666105 ms. Average over 10 runs is 0.678716 ms. Average over 10 runs is 0.673252 ms. Average over 10 runs is 0.713142 ms.
    • fp32
    • ./bin/giexec --model=data/samples/mnist/mnist.caffemodel --deploy=data/samples/mnist/mnist.prototxt --output=prob --batch=2 model: data/samples/mnist/mnist.caffemodel deploy: data/samples/mnist/mnist.prototxt output: prob batch: 2 Average over 10 runs is 0.623084 ms. Average over 10 runs is 0.599357 ms. Average over 10 runs is 0.595663 ms. Average over 10 runs is 0.598121 ms. Average over 10 runs is 0.668126 ms. Average over 10 runs is 0.5931 ms. Average over 10 runs is 0.469668 ms. Average over 10 runs is 0.547794 ms. Average over 10 runs is 0.548473 ms. Average over 10 runs is 0.552826 ms.

Has anyone experienced the same thing, and/or can anyone provide an explain?

Hi hx,

The speeds of both GIE and MXNet are not stable, and are ranged from ~30ms to ~200ms.
Please fix the CPU/GPU/Mem to max freq and try it again, User script attached.

When running only 1 image, GIE is typically much faster than MXNet. However, when running the whole dataset, say 50K images, the average speed of GIE is always (much) slower than MXNet.
Assuming it is inference time, right? Have you excluded the GIE network transition time? Please add one loop for inference API directly with one image. The inference time should be same no matter which image was fed.
Would you please let us know how the test was conducted?

When running GIE on TX1, fp16 is slower than fp32, which is confirmed by the official sample.
This is expected. Reason is that in mnist network, the convolution layers are short. But has 2 IP layers. The time for fp16 is longer than fp32 is because of the IP layers. If the convolution layers becomes more and more, then fp16 is much faster than fp32.

Thanks
burst_CPU_GPU_EMC.zip (494 Bytes)

@kayccc, the script works! Thanks soooo much!!