A question about core size and speed like FP64, FP32, INT32

1024476687 · April 23, 2020, 2:28pm

Recently,I want to summarize a list about core size and computation speed briefly.Such as the size and speed of these cores like FP32,INT32,INT16,INT8 and INT4.But I can’t find this type of information.When I searched, I always found the introduction about whole framework of gpu production, but little description in hardware details.If you know how can i find the information,please help me. Thanks a lot.

luis.leon · April 23, 2020, 5:53pm

Hi @1024476687

Getting such detailed information is too hard. The possible workaround is trying to extrapolate if you have the information for the two types of data: int and float. Generally, the vector units are symmetric, in the sense that if we get an speed S for int32, we are going to have 2S for int16 and 0.5S for int64.

Now, a way to get the speed is doing it empirically, by measuring for a given type of data and then, apply the extrapolation that I mentioned above.

As an observation, I mentioned “vector units”. There are some hardware specialisations inside of each SM. Let’s take the Pascal architecture as an example.

Looking at the page 11, you can find the number of FP32 units per core and per GPU. Let’s take FP32 for our brief analysis and the K40. According to the whitepaper, we have 2880 FP32 Cuda Cores. A little bit lower in the table, we can find the Peak Performance, which is 5040 GFLOP/s, which is really the ideal case.

With that data, we can find the speed of each core by dividing the Peak Performance by the FP32 Cuda Cores for a rough approximation:

Pcore = 5040 GFLOP/s / 2880 = 1.75 GFLOP/s per core.

Now, I have answered the speed of the core, but I have the doubt about your second question. Do you mean the area footprint of the core?

Hope this can help you or at list guide you. Finding this kind of info (whitepapers) and processing it a bit is easy for finding what you are looking for (speed).

Regards,
Leon

1024476687 · April 24, 2020, 2:14am

Thank you very much for your answer. Your answer is very detailed and solves my question well. Yes, my second question is the area footprint of the core. I’ve seen documents like volta-architecture-whitepaper.pdfbefore. At page 13, I found that INT and FP32 have different footprint area in figure 5. But there is no detailed information about the footprint area in the document. And I can hardly find such information as INT8, INT16and so on

luis.leon · April 24, 2020, 9:09am

Hi,

About those data types, you can assume that they are going to use the same cores for int. You can make bunches of two int16_t, or four int8_t and process them in parallel in an int core. You will hardly find a dedicated unit for those kind of data for reasons of cost and relevance. Usually you might want to recycle one huge unit and use it to process smaller data.

Hope this can help you somehow.

Regards,
Leon.

Topic		Replies	Views
Question about core size and speed Jetson TK1 hw , cuda , kernel	8	702	October 18, 2021
How FP32 and FP16 units are implemented in GP100 GPU's CUDA Programming and Performance	8	7421	March 28, 2017
Same inference speed for INT8 and FP16 TensorRT	10	5717	October 12, 2021
fp16 vs fp32 CUDA Programming and Performance	3	3786	November 13, 2017
Understanding of Tensor Core, Cuda Core and other cores in Ampere architecture CUDA Programming and Performance tensorrt , cuda	8	3363	December 3, 2022
Type conversion throughput/latency CUDA Programming and Performance	5	436	February 3, 2024
Same memory usage for fp16 and int8 Jetson Xavier NX tensorrt	4	2105	September 27, 2021
FFT Speed vs. x86 CUDA Programming and Performance	14	24654	July 27, 2008
Best way to accelerate for loops in kernel? CUDA Programming and Performance cuda , kernel	5	443	December 13, 2023
How cuda core compute fp16 data in different nvidia arch？ CUDA Programming and Performance cuda	8	444	November 25, 2024

A question about core size and speed like FP64, FP32, INT32

Related topics