How better is Jetson TX2's CUDA score

I use YOLO v2 to train 434 images with 480x270 size on both TX2 and my Lenovo T460p.
The spec for T460p is:
Intel Core i7-6820HQ
Nvidia GeForce 940MX
CUDA version: 9.0
CUDNN 7.0
Ubuntu Linux 17.10 64bit

TX2:
Jetpack 3.1 (L4T R28.1)

And I find that, even I have executed nvpmodel -m 0 on TX2, the training speed of T460p is faster than that of TX2. T460p takes shorter training time.

Well, I thought that TX2 should have trained faster than that T460p does cause almost everybody tells me that TX2 is awesome.

From below link, I tried to find TX2’s CUDA score but failed:
[url]https://browser.geekbench.com/cuda-benchmarks[/url]
[url]https://gist.github.com/cavinsmith/ed92fee35d44ef91e09eaa8775e3284e[/url]

Besides, with darknet to test the same video file with same model (yolo v2) and same weights,
TX2 plays at around 1.3fps,
T460p with GeForce 940MX plays up to to 5.5 fps.
T460p without GeFoece 940MX (no gpu mode) just plays at around 0.3 fps.

Did I miss any configuration ? Or TX2’s CUDA score IS lower than that of GeForce 940MX ?

The desktop system has far more CUDA cores. Jetsons are almost never used for training…they are optimized for edge computing with pre-trained models. A TX2 has only 256 CUDA cores…what makes this so extremely good is that it is done with almost no power requirement. The desktop video cards use far far more electrical power to do the same thing and generally are not suitable for a mobile environment…imagine using a car battery on your drone :P

Consider training elsewhere, and then actual deployment in the field with the Jetson.

Also beware that some applications need to have CUDA enabled.

Jetson TX1/TX2’s integrated GPU feature FP16 (half-precision 16-bit floating-point) for accelerated precision with 2x throughput. GeForce 940MX does not. For more info see these Parallel ForAll articles about the Jetson’s performance: