I was studying the L40 and RTX 6000 Ada technical data sheets and wondering about the different FP16/FP8 TFLOP specs.
According to the data sheets, both GPUs are Ada-based, feature 4th Gen TensorCores, and have 18,176 CUDA Cores as well as 568 TensorCores. Apart from minor GPU frequency and VRAM differences, the GPUs should then have roughly equal FP32, FP16, and FP8 throughput?
However, the L40 datasheet says:
FP16: 181 / 362 (sparse) TFLOPs
FP8: 362 / 724 (sparse) TFLOPs
Whereas the RTX 6000 Ada does not specifically list FP16/FP8 performance but rather just “1457 TFLOPs TensorCore performance” and then states in the footnote that this is FP8/sparsity.
So why does RTX 6000 Ada have double the FP8 peak performance in comparison to L40? I would have expected the L40 (datacenter) to at least be similar in speed.
Furthermore, what would the non-sparsity FP8 performance of RTX6000 Ada? Half like in all the other cases? It is not stated in the datasheet.
Last but not least: RTX 6000 Ada datasheet does not state FP16 performance at all. Does it not support FP16 Ops?
Hello @daniel.speck ad welcome to the NVIDIA developer community!
You are right, the underlying architecture is very much similar beside a small difference in Bandwidth, and most performance numbers are the same even on the datasheet.
But the overall board design is very much different since they are targeting different use-cases with different system constraints regarding cooling, Multi-GPU and vGPU setups etc. For those reasons alone there is a certain difference in specs to be expected.
That said, the double peak performance in FP8 that you infer can be also be related to other factors, one of which might be as simple as a typo in the datasheet, which I am trying to find out now.
And in case you are considering buying one or the other, you should get into contact with our or our partners’ sales support and clarify your questions. They will be much better qualified to identify your needs and what solution might fit your requirements.
Sure, but does this really extend to FP16, for example, not being available on the RTX 6000 Ada?
Yes, I am considering buying RTX 6000 Ada or L40 atm. Multiple actually. We are in contact with an official NVIDIA partner for that but they actually linked to the datasheets, thus my question here.
To be precise: for a research project we are currently looking at different 4x/8x GPU server setups of which we want to buy multiple and consider RTX 6000 Ada and L40 atm.
Sure, but does this really extend to FP16, for example, not being available on the RTX 6000 Ada?
Oh, I didn’t say that, sorry if it sounded that way. To my knowledge there is no reason why RTX6000 Ada would not support FP16 for example through CUDA. It is simply not pointed out specifically in the data sheet for some reason.
May I ask where you are situated? It might be a good idea if you ask the NVIDIA partner if they can get you in contact with a local/regional Solutions Architect directly from NVIDIA.
However, I think it would be good for everyone stumbling upon this thread here to know if whether there was a typo on the datasheets and what the correct FP8/FP16 speeds for both GPUs (L40 and RTX 6000 Ada) would be.
The short answer is that the spec details in the respective PDF files are correct.
They are not written in a way to be directly comparable because those GPUs serve completely different needs.
To learn more about which target workload of the different product areas fits your specific requirements bets you need to get into contact with a regional sales representative.
I am sorry if this does not exactly answer your question.