How to tell if GPU cores are actually 32/64-bit processors

Is there a way to tell if the internal GPU core is a 32/64 bit processor?
In my case, the GPU is Nvidia Tesla K80. But I am also interested to know that for other GPUs as well.
Where to find such information?
Is there a command or program to find that?

Register size (word width) for sure is 32 bits. Double precision and 64 bit integer variables take two registers. Most ALUs (except for the double precision arithmetics unit) are therefore 32 bits wide.

The address space the GPU can handle sure can be >=4 GB.


You seem very sure about this cbuchner1.

Is there a reference for that.

Also, is there a command, datasheet or whatever that shows these information?

Cross reference:

I posted that question actually :)
I also gave an answer but I am not sure of it.

GPUs contain a mix of 32bit and 64bit capability.

I would suggest you read any of the gpu architecture whitepapers to learn some of the things that are natively supported at 32 bits and some of the things that can be handled at 64-bit width.

The processors since (and including) Fermi are generally capable of behaving as a 64-bit machine (from the high-level-language programmer’s perspective). However not all operations (e.g. integer multiply) are “natively” supported at 64-bits. Most GPUs have a native 32 bit integer multiply machine language instruction, because the functional unit that services this instruction type can handle that directly, at the hardware level. None of the GPUs have a “native” (i.e. single machine language instruction) that performs 64-bit integer multiply, because there is no hardware functional unit that directly supports that activity. Instead, code written in CUDA C/C++ that effectively needs a 64-bit integer multiply will be decomposed by the CUDA compiler into an instruction sequence that constructs the 64-bit integer multiply out of a sequence of instructions (e.g. using 32-bit integer multiply and other operations).

One example of a “native” 64-bit operation is 64-bit floating point multiply (or multiply-add). This can be done/represented by a single machine language instruction because there is a hardware functional unit that supports this directly.






There are other related whitepapers available. Just google the architecture name (e.g. maxwell) and “whitepaper”

The question whether some processor is an N-bit processor is not answerable without establishing a set of criteria first. Different people have used different criteria over the years, sometimes to support a specific (non-technical) agenda. What prompted this question? What specific problem are you trying to solve?

In common usage (as far as known to me), support for IEEE-754 double-precision floating-point operations does not qualify a processor as a 64-bit processor. For example the Intel 486 processor supported operations on 64-bit and 80-bit floating-point operands, but was never considered anything else but a 32-bit CPU. Similarly, addressing capability is not commonly used to define “bitness”: The Intel 8086 processor could address one MB of memory, but was never consider anything else but a 16-bit processor.

The most commonly used criterion for an N-bit processor is the width of integer registers. Based on that, all “recent” NVIDIA GPUs (i.e. Tesla through Pascal architectures, 2007 through today), are 32-bit processors.

A round of standing ovations for Norbert Juffa’s excellent explanation!

I have a very large problem to code on GPU.
I can divide my problem into smaller pieces as small as I wish.

For me, my ultimate objective is to get the lowest execution time to solve my big problem.

I need to decide whether to go for 32/64-bit datatypes, memory transfer … and so on.

That’s why I need a way to tell if my hardware is 32/64 bit kind of a machine.

Is this a floating-point intensive application, or an integer-intensive one? If the former, be aware that consumer GPUs have very low throughput of double-precision operations. If the latter, you now know that all 64-bit integer operations are emulated, it will be best to stick to 32-bit integer types if you can. As you noted, the size of each data item also impacts memory bandwidth and other things like shared memory usage that may be important to the application.

My suggestion (on any platform) would be to start running some basic experiments to get a feel for how the performance scales with problem size, rather than theorizing about performance.

My application is integer-intensive. I am pretty sure now that 64-bit instructions (addition/multiplication) are just emulated by my GPU.

I’ll target 32-bit datatypes and memory access per thread.

Thanks much njuffa.

Emulation of 64-bit integer addition and multiplication is an absolute certainty. You can look at the generated machine code with cuobjdump --dump-sass to convince yourself.

If you target a Maxwell or Pascal GPU, you will find that even 32-bit integer multiplication is emulated via 16-bit multiplies. Also, 32-bit integer division is emulated on all GPUs.