I’ve been migrating our Smart Vehicle CNN inference from Standard PC with 1080 Ti GPU card to Drive PX 2 AutoChauffeur.
I’ve observed a significant increase in inference time(Drive PX 2 is about 3x slower compared to 1080Ti)
Further investigation using giexec for googlenet from tensorrt 2.1 package ( /usr/srs/tensorrt/data/googlenet) seems to confirm these observations:
./giexec --model=…/data/googlenet/googlenet.caffemodel --deploy=…/data/googlenet/googlenet.prototxt --output=prob --batch=16 --device=0
On PC with 1080Ti the avg. time is 7.84 ms, on Drive PX2 31.8 ms
If I enable --half2 mode and use --device=1 giexec reports inference time of 60.58 ms
Batch size 1 gives 1.22 ms for 1080Ti, 3.26 ms for DrivePX2 device 0, 10.52 ms for DrivePX2 device 1 with half2 mode.
I have several questions:
- Is it the expected result?
- Could you please specify where I can find information about how many CUDA cores each Tegra GPU and additional dGPU have.
- Can I directly compare CUDA cores present on DrivePX2 Tegra to 1080Ti graphic card CUDA cores?
- Which factor (cores, GFLOPS?) can be used to compare Tegra dGPU to 1080Ti GPU?
- Is it possible to use multiple Tegras or GPU-dGPU combination to speed up the inference ?