I’m evaluating the TX2 for face detection in high-resolution photos, and am wondering if someone can confirm that my results sound reasonable.
For a photo of about 4000x2600 resolution, and using dlib to do DNN face detection:
- ~4000ms on TX2
- ~450ms on GTX 1070 (same code, same photo)
Does this sound right? Is the GPU really 10x slower than the GTX 1070?
(For reference, doing HOG face detection on the same photo I get ~4500ms on the TX2 and 2400ms on my (fast) desktop machine.)
Please maximize clock and try it again:
You can find our benchmark data here:
Thanks for the response. I have run that script already before posting these results.
As for the benchmarks that you pointed me to, I’m not sure how to correlate those with my test of dlib to determine if my results are reasonable or if there might be something else going on.
What library/method would you recommend for the fastest possible face detection in high-res photos on the TX2?
Yes, a desktop GPU is going to be several times faster than the TX2 GPU.
The exact multiple varies based on workload.
There are several reasons for this:
- number of CUDA cores/warps
- speed of CPU driving the GPU
- speed and contention for RAM
Especially the last one is important, because the TX2 uses shared RAM between GPU and CPU. That RAM is a lot slower than the dedicated GDDR5 you’ll find on a desktop GTX card. On the other hand, if you’re bound on communication speed between CPU and GPU, that can actually be more efficient on the TX2 because of this sharing, somewhat depending on specifics.
Not sure dlib use cuDNN or self-implemented CUDA code.
Have you checked our object detection example:
We can get above 10fps with image resolution 640x640.
@snarky: Thanks for the response, this clears things up a lot.
@AastaLLL: Yes, dlib uses cuDNN. 10fps at 640x640 is not so surprising when compared to my example since, as I mentioned, I’m testing on a 4000x2600 image.