L40 vs. RTX 6000 Ada FP16/FP8 throughput?

daniel.speck · March 7, 2023, 10:35am

I was studying the L40 and RTX 6000 Ada technical data sheets and wondering about the different FP16/FP8 TFLOP specs.

According to the data sheets, both GPUs are Ada-based, feature 4th Gen TensorCores, and have 18,176 CUDA Cores as well as 568 TensorCores. Apart from minor GPU frequency and VRAM differences, the GPUs should then have roughly equal FP32, FP16, and FP8 throughput?

However, the L40 datasheet says:
FP16: 181 / 362 (sparse) TFLOPs
FP8: 362 / 724 (sparse) TFLOPs

Whereas the RTX 6000 Ada does not specifically list FP16/FP8 performance but rather just “1457 TFLOPs TensorCore performance” and then states in the footnote that this is FP8/sparsity.

So why does RTX 6000 Ada have double the FP8 peak performance in comparison to L40? I would have expected the L40 (datacenter) to at least be similar in speed.

Furthermore, what would the non-sparsity FP8 performance of RTX6000 Ada? Half like in all the other cases? It is not stated in the datasheet.

Last but not least: RTX 6000 Ada datasheet does not state FP16 performance at all. Does it not support FP16 Ops?

MarkusHoHo · March 7, 2023, 2:30pm

Hello @daniel.speck ad welcome to the NVIDIA developer community!

You are right, the underlying architecture is very much similar beside a small difference in Bandwidth, and most performance numbers are the same even on the datasheet.

But the overall board design is very much different since they are targeting different use-cases with different system constraints regarding cooling, Multi-GPU and vGPU setups etc. For those reasons alone there is a certain difference in specs to be expected.

That said, the double peak performance in FP8 that you infer can be also be related to other factors, one of which might be as simple as a typo in the datasheet, which I am trying to find out now.

And in case you are considering buying one or the other, you should get into contact with our or our partners’ sales support and clarify your questions. They will be much better qualified to identify your needs and what solution might fit your requirements.

Thanks!

daniel.speck · March 7, 2023, 2:39pm

That would be very nice, thank you very much!

Sure, but does this really extend to FP16, for example, not being available on the RTX 6000 Ada?

Yes, I am considering buying RTX 6000 Ada or L40 atm. Multiple actually. We are in contact with an official NVIDIA partner for that but they actually linked to the datasheets, thus my question here.

To be precise: for a research project we are currently looking at different 4x/8x GPU server setups of which we want to buy multiple and consider RTX 6000 Ada and L40 atm.

MarkusHoHo · March 7, 2023, 3:09pm

Sure, but does this really extend to FP16, for example, not being available on the RTX 6000 Ada?

Oh, I didn’t say that, sorry if it sounded that way. To my knowledge there is no reason why RTX6000 Ada would not support FP16 for example through CUDA. It is simply not pointed out specifically in the data sheet for some reason.

May I ask where you are situated? It might be a good idea if you ask the NVIDIA partner if they can get you in contact with a local/regional Solutions Architect directly from NVIDIA.

daniel.speck · March 7, 2023, 3:23pm

@MarkusHoHo sure, I’ll drop you a message.

However, I think it would be good for everyone stumbling upon this thread here to know if whether there was a typo on the datasheets and what the correct FP8/FP16 speeds for both GPUs (L40 and RTX 6000 Ada) would be.

MarkusHoHo · March 7, 2023, 3:55pm

I agree with that. Although our Sales team might have specific reasons for the content differences in those tables which I cannot influence.

MarkusHoHo · April 4, 2023, 10:09am

Apologies that it took a bit longer.

The short answer is that the spec details in the respective PDF files are correct.

They are not written in a way to be directly comparable because those GPUs serve completely different needs.

To learn more about which target workload of the different product areas fits your specific requirements bets you need to get into contact with a regional sales representative.

I am sorry if this does not exactly answer your question.

Thanks!

system · April 18, 2023, 10:09am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
No speedup on L40s wrt RTX6000 Ada CUDA Programming and Performance	2	2227	April 1, 2024
Ada GeForce (RTX 4090) FP8 cuBLASLt performance GPU-Accelerated Libraries cublas	7	11451	November 2, 2023
Is RTX 4000 ADA CUDA capable? CUDA Programming and Performance	7	588	December 24, 2024
what is the double-precision flops rating of the gtx580? CUDA Programming and Performance	16	33443	April 10, 2014
4090 doesn't have fp8 compute? CUDA Programming and Performance	20	12518	August 6, 2024
Looking for full specs on NVIDIA A5000 CUDA Programming and Performance	2	2603	June 16, 2022
Student buying card for CUDA. Which one? CUDA Programming and Performance	16	14853	December 4, 2012
Tensor core differences between L40 and L40S? (and RTX 6000 Ada?) CUDA Programming and Performance	0	9191	August 14, 2023
Best GPU for AI workloads (not DL training) CUDA Programming and Performance	16	5170	April 1, 2021
GF100 vs GF104 Performance question CUDA Programming and Performance	18	8919	September 4, 2010

L40 vs. RTX 6000 Ada FP16/FP8 throughput?

Related topics