Coarse comparision of Xavier with desktop GTX10XX series performance

Coarse comparision of Xavier with GTX10XX series performance


It is quoted that the Jetson AGX Xavier has a 32 TOPs performance. If the desktop GTX1070 and GTX1080 have 5.1 and 9.2 TFLOPs single precision 32 bit processing power respectively. Is it correct to say the peak performance of the Xavier is 3x a GTX1080??


Again [Xavier] “is capable of up to 11 TFLOPS FP16”. Is this the wikipedia half precision column? Therefore this is 100 times what the 1070 and 1080 can do 0.11 and 0.14 TFLOP respectively?

What am I missing? This is amazing at 30 watts thermal.

Just to confirm: my real questions are can the Xavier with an appropriate carrier board

  1. comfortably drive a VR headset, say the Vive Pro?
  2. take 4 4 lane MIPI cameras at say 8MP@60FPS, stitch in say OpenCV, and stream to the VR headset?

No, The 32 TOPs are very dependent on what you’re doing. It’s mostly INT8 and fp16 work, and might also include the video processors and maybe even the 8-core CPU.

The 1080 is faster in lower precision, too.

If you can live with the low precision implementations, then the Jetson AGX Xavier does, indeed, provide pretty good bang for the Watt, but you do get lower precision than you’ll get with the 1080 and other desktop/workstation/server class GPUs.

Thanks for the reply.

and might also include the video processors maybe even the 8-core CPU.
Yes it is harder to compare a straight desktop/workstation/server class GPU with Xavier which has the video processors and CPU included.

If you can live with the low precision implementations,
Maybe. We are looking at a potential mobile telerobotics/teleopertion use case… so low power high throughput is more important to us.

What defines a “core” either in terms of this GPU spec is not clear to a novice potential customer either… How does the GTX 1070 “1920:120:64 (15) core count of Shader Processors : Texture Mapping Units : Render Output Units : Tensor Cores (Streaming Multiprocessors) (Graphics Processing Clusters)”, compare with the “512-core Volta GPU with Tensor Cores”?

Also, is there an actual technical definition of “VR ready” used by Nvidia marketing? Transistor count/clock speed + memory/processing power?

The number you should compare is the 1920 versus the 512, if you use CUDA for the network.
If you use TensorRT, I believe it can also use the DLA units, which the Jetson has and the DTX doesn’t, at all.
The Xavier has two DLA workers with 5.7 TFLOPS FP16 / 11.4 TFLOPS int8. Who knows what this works out to in equivalent CUDA cores?
But it gets harder to compare: The Jetson Xavier cores are Volta, which is a newer generation than the GTX which is Pascal. Presumably the Volta gets more done per core than the Pascal.

The only way to know for SURE is to implement on both platforms, optimize for both platforms, and benchmark your actual load.

Or, if power draw and compactness is important, and you believe the Xavier is good enough, then just go with that. Note that the 1070 needs another CPU to drive it, with an OS, and drivers, and motherboard, and power supply, and so on – all of that comes built-in on the Xavier, in a pretty compact space. (Even though the heat sink / cooler on the devkit is quite impressively heavy!)

I think that came out of some mainline requirements for the Oculus VR headset. Last I checked, “VR ready” hardware is approximately equivalent to at least an Intel Core i5 four-core CPU plus a GTX 970. How that translates to modern embedded systems is, again, likely a matter of markitechtural discretion.

Tops and Flops are completely different units …

  • Tops are tensor operations per second
  • Flops are floating point operations per second

Tensor is like more universal vector ( better to say multidimensional vector - value, vector or multidimensional matrix )

  • value : 53
  • vector : [3,8,125,-15]
  • matrix (two dimensions):
  • multidimensional matrix: wherever your imagination takes you to :D

Floating point is probably more famous and known :D its just not the whole number

  • floating point number :

To allow to tensor operation to happen, there has to be interaction of at least two tensors … In case that both tensors are native numbers , it will be simple math operation ( i.e T1 + T2 ) but as tensor is getting more complex its getting fckin complicated :D still the same TOPS ? seems strange to me , as i understand it properly (correct me if i am wrong please) number of TOPS differs according to complexity of the solved problem

Sometimes its better to not try to “imagine” it :D it just works, its measured , so yea why not to have multi dimensional data storage ( for example ) :D we are so fixed to 3D :D

That is great on programming , you dont have to have real objects , they can be abstract, non existing , not possible , you just tell the computer that it is real ;D