New compute capability: sm_37

allanmac · February 6, 2015, 9:16pm

I haven’t seen any comments on the new sm_37 arch described in the CUDA 7.0 RC docs.

The summary is that it appears to be an sm_35 with 2x the registers and 2.33x shared.

This implies you can launch 64 warps each with 64 registers. sm_3x has a 16 block limit so you would need at least 4 warps per block.

Another example would be launching 32 warps with 128 registers and 28 words of shared per thread. Notice the 64K registers/block limit.

That’s a lot of resources… :)

njuffa · February 6, 2015, 10:02pm

This appears to be the official confirmation about the specs of the GK210 which features in K80, which I first spotted here:

[url]http://www.techpowerup.com/207265/nvidia-breathes-life-into-kepler-with-the-gk210-silicon.html[/url]
“While both chips are based on the “Kepler” architecture, GK210 features double the shader cache amount. Each of the 15 streaming multiprocessors (SMXs) features 128 KB of shader cache, compared to 64 KB per SMX on the GK110. The GK210 also has a 512 KB register file per SMX, double the size of the 256 KB register file size, of the GK110.”

As far as shared memory is concerned, just as sm_35 has 64 KB - 16 KB = 48 KB available to user programs, sm_37 seems to sport 128 KB - 16KB = 112 KB.

Skybuck · February 7, 2015, 11:53pm

I haven’t downloaded cuda 7 sdk rc… however I did take a look at the online cuda 7 sdk doc… but nothing in there mentioned any of this new 3.7 compute capability.

I was kinda wondering if the online doc is a “release/oroduction” version or also a “release candidate”.

I guess it’s only a “release/production doc” (?) so I guess that answers my curiosity about that.
(Could be nice to have a release candidate online documentation version as well (safes me from having to download and install a new sdk (800 MB for the SDK is quite large when running low on free space or so… I know we/I get terrabyte harddisks/drives but still ;)). The online version could safe some time too… so I do hope to see a release candidate documentation version some time in future or so (?) ;)

Skybuck · February 7, 2015, 11:54pm

Actually I have another question on my mind ? Are we even allowed to discuss release candidates on this forum ? Perhaps this violates Non-Disclosure agreements ?!?

Skybuck · February 9, 2015, 11:33am

I read agreement that has to be signed… I think this may violate it. But I can understand that developers using the preview SDKs may have questions and want to ask nvidia about that.
So maybe this forum would need a private section (closed to the public) so that registered developers can ask “private” / “non-public” questions there.

NVD · February 25, 2015, 10:43am

http://international.download.nvidia.com/pdf/kepler/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf

GK210 whitepaper confirms GK210 is Compute Capability 3.7 and since this is available for anyone to download, it’s not under any NDA.

Topic		Replies	Views
Please help me understand this GK210 spec CUDA Programming and Performance	3	788	October 8, 2015
Unofficial Kepler Slides from Random Gamer Site Yeah, yeah, but we only have another week to rumor-m CUDA Programming and Performance	63	10327	April 5, 2012
Why GK110 has 192 cores but 4 warps? CUDA Programming and Performance	8	5276	June 6, 2012
Question about Fermi 2.1 architecture of SM(s) of 48 cores and warps of 32 threads (from a Newbie) CUDA Programming and Performance	2	1907	December 6, 2015
Increased number of concurrent kernels for kepler? How many concurrent kernels can a kepler card lau CUDA Programming and Performance	7	4380	March 30, 2012
GK104 / GK110 shared memory bandwidth discussion CUDA Programming and Performance	7	2021	December 2, 2012
architecture on gpu CUDA Programming and Performance	2	2696	April 26, 2010
A question about the correspondence between warp and core CUDA Programming and Performance	17	7748	February 1, 2019
Oddly high regcounts in sm_70 compared to sm_61 CUDA Programming and Performance	6	1048	August 21, 2018
One question about the registers in Keple structure GK110 (Quadro K6000) CUDA Programming and Performance	6	1033	December 18, 2015

New compute capability: sm_37

Related topics