I am running Inception-BN on ImageNet using GIE, and comparing its speed to running the original model using MXNet. Then some strange things happened.
On my desktop with a GTX 1080, the result is promising: ~8ms vs ~3.5ms, which is also stable, no matter I ran 1 image or 1000 images.
However, when running on TX1, there are three peculiar phenomena:
- The speeds of both GIE and MXNet are not stable, and are ranged from ~30ms to ~200ms.
- When running only 1 image, GIE is typically much faster than MXNet. However, when running the whole dataset, say 50K images, the average speed of GIE is always (much) slower than MXNet.
- When running GIE on TX1, fp16 is slower than fp32, which is confirmed by the official sample:
- fp16
- ./bin/giexec --model=data/samples/mnist/mnist.caffemodel --deploy=data/samples/mnist/mnist.prototxt --output=prob --half2 --batch=2 model: data/samples/mnist/mnist.caffemodel deploy: data/samples/mnist/mnist.prototxt output: prob half2 batch: 2 Average over 10 runs is 0.746263 ms. Average over 10 runs is 0.719342 ms. Average over 10 runs is 0.755184 ms. Average over 10 runs is 0.785189 ms. Average over 10 runs is 0.87802 ms. Average over 10 runs is 0.637726 ms. Average over 10 runs is 0.666105 ms. Average over 10 runs is 0.678716 ms. Average over 10 runs is 0.673252 ms. Average over 10 runs is 0.713142 ms.
- fp32
- ./bin/giexec --model=data/samples/mnist/mnist.caffemodel --deploy=data/samples/mnist/mnist.prototxt --output=prob --batch=2 model: data/samples/mnist/mnist.caffemodel deploy: data/samples/mnist/mnist.prototxt output: prob batch: 2 Average over 10 runs is 0.623084 ms. Average over 10 runs is 0.599357 ms. Average over 10 runs is 0.595663 ms. Average over 10 runs is 0.598121 ms. Average over 10 runs is 0.668126 ms. Average over 10 runs is 0.5931 ms. Average over 10 runs is 0.469668 ms. Average over 10 runs is 0.547794 ms. Average over 10 runs is 0.548473 ms. Average over 10 runs is 0.552826 ms.
Has anyone experienced the same thing, and/or can anyone provide an explain?