I’m doing image processing with caffe on a NVIDIA 960m and getting about 10 frames per second. I was wanting to deploy on a Jetson TX2 but was wondering what sort of speed difference I might expect. I see the TX2 has 256 cores vs 640 cores for the GTX 960m. Does this mean I should expect about a third of the speed with a TX2?
I don’t believe this as simple as the ratio of core counts and will likely depend on what is bottlnecking your current application to 10fps. The number of cuda cores isn’t always the bottleneck. The TX2 can clock it’s 265 cuda cores at a higher rate than the GTX 960m (1.3 GHz. vs 1.0Ghz by default). Also, they have different cuda core architectures (Pascel on TX2 vs. Maxwell on 960m). If you can use FP16, I think the TX2 can achieve double througput per core per clock vs the GTX 960m which doesn’t have hardware support for dual FP16 and is stuck at FP32 speeds. Finally the two also have very differnet memory subsystems as the TX2 GPU shares the same memory and controller with the CPU’s while the 960m has external GDDR5 (58GB/s vs 80Gb/s).
All this being said, 1/3 + 30% (i.e. 4.3fps) might be a good inital estimate if you are truly compute limited as this would take into acount decresed core count but increased clock rate. But as I stated, it can be difficult to predict performace unless you have a strong understanding of your current performace bottleneck.