Many NVIDIA GPU datasheets, for example the P40 & P100, use the term “INT8” leaving developers to guess what exactly this means. Well, yes, I can guess that it means support for 8-bit arithmetic, but could you please provide YOUR definition so I am not reduced to guessing?
My other question in this are is: what about 16-bit arithmetic operations? The application I am working on requires 16-bit integer operations, and I see terms like “FP32” and “FP16” and even the aforementioned “INT8” on several of the data sheets, but nowhere do I see anything like “INT16”. I have an application that uses 16-bit integer arithmetic. Is it supported by these machines or not?
Yes, you can operate on 16-bit data with a GPU. Use the appropriate type in your CUDA and C++ code, e.g. ‘short’, ‘unsigned short’, ‘int16_t’, ‘uint16_t’. CUDA also offers predefined packed types like ‘ushort2’ but not operations on those types, i.e. these are mostly useful for optimizing memory accesses.
Note that for languages in the C++ family (this includes CUDA) it is specified that during the evaluation of expressions integer data with types narrower than ‘int’ is converted to ‘int’ first before entering into the computation. On all platforms supported by CUDA, ‘int’ is a 32-bit type. So the use of 16-bit types mostly has the benefit of reducing storage requirements in memory, but as a trade-off can require additional conversions.
Thanks, njuffa. I am familiar with the C automatic data conversions, yes. But, come to think of it, the data sheet makes no mention of any “INT” or “INT32” at all, so it kind of leaves the impression that only 8-bit ints are supported (as wildly unlikely as that would seem).
I have in fact already developed the first prototype of my application in Cuda C and C++, and it is right now running on an M60 GPU. I am looking to productize it and thinking of using a P40 GPU for that; but I was a little put off by seeing “FP32”, “FP16”, and “INT8” on the data sheet with no mention of any “INT16”. This, in addition to being a little annoyed that such acronyms are thrown around, but nowhere on the NVidia Cuda website is any definition of the acronyms provided. These may be standard acronyms in somebody’s world, but they are not standard in my world. uint16, int16, etc. are fairly standard in my experience, but the precise meaning of “INT8” is anyone’s guess to an old C and C++ guy like me.
INT8, INT16 and FP16 are usually mentioned specifically in datasheets when there are native hardware arithmetic instructions that can deal with these data types such as the __dp2a() and __dp4a() instructions on Pascal that can accelerate AI inference.
The data sheet (not sure what you are looking at) presumably talks about hardware capabilities. That is what data sheets tend to do. Does the data sheet for a 64-bit CPU call out the fact that it can also process 16-bit data? Probably not. Because some of the most recent GPUs have special instructions for operating on INT8 (8-bit integer) data, it is called out in the data sheet. That does not mean GPUs were incapable of operating on 8-bit integer data previously.
GPUs are essentially 32-bit architectures, with some extensions to allow 64-bit addressing, for example. But C++ (and thus CUDA, which is currently based on C++11) abstracts from the machine, and you can use all the usual integer types.
Become a globetrotter :-) FP32 is one way of referring to single-precision floating-point (typically implied: of the IEEE-754 kind). In the lingo of IEEE-754 – the relevant standard – one would call that ‘binary32’, C/C++ folks would usually call that ‘float’ (although the language standard specifies no such equivalency!), and in older Fortran code it may be referred to as ‘REAL*4’. FP16 is half-precision floating point, defined as a ‘binary16’ storage format in the IEEE-754 (2008) standard. FP16 support is only slowly making its way into various programming languages.
Totally cool. Thanks, guys. You’ve answered all my questions.