Thanks for the response. I have run that script already before posting these results.
As for the benchmarks that you pointed me to, I’m not sure how to correlate those with my test of dlib to determine if my results are reasonable or if there might be something else going on.
What library/method would you recommend for the fastest possible face detection in high-res photos on the TX2?
Yes, a desktop GPU is going to be several times faster than the TX2 GPU.
The exact multiple varies based on workload.
There are several reasons for this:
number of CUDA cores/warps
speed of CPU driving the GPU
speed and contention for RAM
Especially the last one is important, because the TX2 uses shared RAM between GPU and CPU. That RAM is a lot slower than the dedicated GDDR5 you’ll find on a desktop GTX card. On the other hand, if you’re bound on communication speed between CPU and GPU, that can actually be more efficient on the TX2 because of this sharing, somewhat depending on specifics.
@snarky: Thanks for the response, this clears things up a lot.
@AastaLLL: Yes, dlib uses cuDNN. 10fps at 640x640 is not so surprising when compared to my example since, as I mentioned, I’m testing on a 4000x2600 image.