How to tell if GPU cores are actually 32/64-bit processors

caesaretos · February 15, 2017, 9:48am

Is there a way to tell if the internal GPU core is a 32/64 bit processor?
In my case, the GPU is Nvidia Tesla K80. But I am also interested to know that for other GPUs as well.
Where to find such information?
Is there a command or program to find that?

cbuchner1 · February 15, 2017, 9:53am

Register size (word width) for sure is 32 bits. Double precision and 64 bit integer variables take two registers. Most ALUs (except for the double precision arithmetics unit) are therefore 32 bits wide.

The address space the GPU can handle sure can be >=4 GB.

Christian

caesaretos · February 15, 2017, 10:08am

You seem very sure about this cbuchner1.

Is there a reference for that.

Also, is there a command, datasheet or whatever that shows these information?

njuffa · February 15, 2017, 10:27am

Cross reference: [url]cpu architecture - How to tell if Nvidia GPU cores are 32/64 bit processors - Stack Overflow

caesaretos · February 15, 2017, 10:31am

I posted that question actually :)
I also gave an answer but I am not sure of it.

Robert_Crovella · February 15, 2017, 2:20pm

GPUs contain a mix of 32bit and 64bit capability.

I would suggest you read any of the gpu architecture whitepapers to learn some of the things that are natively supported at 32 bits and some of the things that can be handled at 64-bit width.

The processors since (and including) Fermi are generally capable of behaving as a 64-bit machine (from the high-level-language programmer’s perspective). However not all operations (e.g. integer multiply) are “natively” supported at 64-bits. Most GPUs have a native 32 bit integer multiply machine language instruction, because the functional unit that services this instruction type can handle that directly, at the hardware level. None of the GPUs have a “native” (i.e. single machine language instruction) that performs 64-bit integer multiply, because there is no hardware functional unit that directly supports that activity. Instead, code written in CUDA C/C++ that effectively needs a 64-bit integer multiply will be decomposed by the CUDA compiler into an instruction sequence that constructs the 64-bit integer multiply out of a sequence of instructions (e.g. using 32-bit integer multiply and other operations).

One example of a “native” 64-bit operation is 64-bit floating point multiply (or multiply-add). This can be done/represented by a single machine language instruction because there is a hardware functional unit that supports this directly.

whitepapers:

fermi:

[url]http://www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_architecture_whitepaper.pdf[/url]

kepler:
[url]https://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf[/url]
[url]http://www.nvidia.com/content/PDF/product-specifications/GeForce_GTX_680_Whitepaper_FINAL.pdf[/url]

maxwell:

[url]http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_980_Whitepaper_FINAL.PDF[/url]

pascal:

[url]https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf[/url]

There are other related whitepapers available. Just google the architecture name (e.g. maxwell) and “whitepaper”

njuffa · February 15, 2017, 4:23pm

The question whether some processor is an N-bit processor is not answerable without establishing a set of criteria first. Different people have used different criteria over the years, sometimes to support a specific (non-technical) agenda. What prompted this question? What specific problem are you trying to solve?

In common usage (as far as known to me), support for IEEE-754 double-precision floating-point operations does not qualify a processor as a 64-bit processor. For example the Intel 486 processor supported operations on 64-bit and 80-bit floating-point operands, but was never considered anything else but a 32-bit CPU. Similarly, addressing capability is not commonly used to define “bitness”: The Intel 8086 processor could address one MB of memory, but was never consider anything else but a 16-bit processor.

The most commonly used criterion for an N-bit processor is the width of integer registers. Based on that, all “recent” NVIDIA GPUs (i.e. Tesla through Pascal architectures, 2007 through today), are 32-bit processors.

cbuchner1 · February 15, 2017, 5:09pm

A round of standing ovations for Norbert Juffa’s excellent explanation!

caesaretos · February 16, 2017, 3:55am

I have a very large problem to code on GPU.
I can divide my problem into smaller pieces as small as I wish.

For me, my ultimate objective is to get the lowest execution time to solve my big problem.

I need to decide whether to go for 32/64-bit datatypes, memory transfer … and so on.

That’s why I need a way to tell if my hardware is 32/64 bit kind of a machine.

njuffa · February 16, 2017, 5:43am

Is this a floating-point intensive application, or an integer-intensive one? If the former, be aware that consumer GPUs have very low throughput of double-precision operations. If the latter, you now know that all 64-bit integer operations are emulated, it will be best to stick to 32-bit integer types if you can. As you noted, the size of each data item also impacts memory bandwidth and other things like shared memory usage that may be important to the application.

My suggestion (on any platform) would be to start running some basic experiments to get a feel for how the performance scales with problem size, rather than theorizing about performance.

caesaretos · February 16, 2017, 6:49am

My application is integer-intensive. I am pretty sure now that 64-bit instructions (addition/multiplication) are just emulated by my GPU.

I’ll target 32-bit datatypes and memory access per thread.

Thanks much njuffa.

njuffa · February 16, 2017, 6:02pm

Emulation of 64-bit integer addition and multiplication is an absolute certainty. You can look at the generated machine code with cuobjdump --dump-sass to convince yourself.

If you target a Maxwell or Pascal GPU, you will find that even 32-bit integer multiplication is emulated via 16-bit multiplies. Also, 32-bit integer division is emulated on all GPUs.

Topic		Replies	Views
TITAN V / Tesla async 64-bit core CUDA Programming and Performance	7	955	January 4, 2018
PTX,... does comparing a bit either a 0 or 1 take 64 bits? CUDA Programming and Performance	3	553	April 13, 2018
Question about 64 Bit Integer Performance CUDA Programming and Performance	12	9630	August 18, 2018
Are 64-bit integer instructions natively supported by GPU? CUDA Programming and Performance	1	2369	October 5, 2009
How much speed of 64bit integer algebra in the latest GPUs? CUDA Programming and Performance	2	2111	April 21, 2014
how does fermi join two core for DP fermi, double precision instruction CUDA Programming and Performance	4	1426	April 2, 2012
Forward looking GPU integer performance CUDA Programming and Performance	22	22196	March 20, 2017
the superior 680 / 690 gpu how many cycles is 32 x 32 == 64 bits integer CUDA Programming and Performance	4	3345	May 2, 2012
32 or 64 bit native integer? CUDA Programming and Performance	10	9156	June 6, 2007
Are operations of add, sum, multiplication and division equivalent in performance regardless of the number of bits? CUDA Programming and Performance	5	1053	August 4, 2023

How to tell if GPU cores are actually 32/64-bit processors

Related topics