A question about core size and speed like FP64, FP32, INT32

Recently,I want to summarize a list about core size and computation speed briefly.Such as the size and speed of these cores like FP32,INT32,INT16,INT8 and INT4.But I can’t find this type of information.When I searched, I always found the introduction about whole framework of gpu production, but little description in hardware details.If you know how can i find the information,please help me. Thanks a lot.

Hi @1024476687

Getting such detailed information is too hard. The possible workaround is trying to extrapolate if you have the information for the two types of data: int and float. Generally, the vector units are symmetric, in the sense that if we get an speed S for int32, we are going to have 2S for int16 and 0.5S for int64.

Now, a way to get the speed is doing it empirically, by measuring for a given type of data and then, apply the extrapolation that I mentioned above.

As an observation, I mentioned “vector units”. There are some hardware specialisations inside of each SM. Let’s take the Pascal architecture as an example.

Looking at the page 11, you can find the number of FP32 units per core and per GPU. Let’s take FP32 for our brief analysis and the K40. According to the whitepaper, we have 2880 FP32 Cuda Cores. A little bit lower in the table, we can find the Peak Performance, which is 5040 GFLOP/s, which is really the ideal case.

With that data, we can find the speed of each core by dividing the Peak Performance by the FP32 Cuda Cores for a rough approximation:

Pcore = 5040 GFLOP/s / 2880 = 1.75 GFLOP/s per core.

Now, I have answered the speed of the core, but I have the doubt about your second question. Do you mean the area footprint of the core?

Hope this can help you or at list guide you. Finding this kind of info (whitepapers) and processing it a bit is easy for finding what you are looking for (speed).



Thank you very much for your answer. Your answer is very detailed and solves my question well. Yes, my second question is the area footprint of the core. I’ve seen documents like volta-architecture-whitepaper.pdfbefore. At page 13, I found that INT and FP32 have different footprint area in figure 5. But there is no detailed information about the footprint area in the document. And I can hardly find such information as INT8, INT16and so on


About those data types, you can assume that they are going to use the same cores for int. You can make bunches of two int16_t, or four int8_t and process them in parallel in an int core. You will hardly find a dedicated unit for those kind of data for reasons of cost and relevance. Usually you might want to recycle one huge unit and use it to process smaller data.

Hope this can help you somehow.