I haven’t seen any comments on the new sm_37 arch described in the CUDA 7.0 RC docs.
The summary is that it appears to be an sm_35 with 2x the registers and 2.33x shared.
This implies you can launch 64 warps each with 64 registers. sm_3x has a 16 block limit so you would need at least 4 warps per block.
Another example would be launching 32 warps with 128 registers and 28 words of shared per thread. Notice the 64K registers/block limit.
That’s a lot of resources… :)
This appears to be the official confirmation about the specs of the GK210 which features in K80, which I first spotted here:
“While both chips are based on the “Kepler” architecture, GK210 features double the shader cache amount. Each of the 15 streaming multiprocessors (SMXs) features 128 KB of shader cache, compared to 64 KB per SMX on the GK110. The GK210 also has a 512 KB register file per SMX, double the size of the 256 KB register file size, of the GK110.”
As far as shared memory is concerned, just as sm_35 has 64 KB - 16 KB = 48 KB available to user programs, sm_37 seems to sport 128 KB - 16KB = 112 KB.
I haven’t downloaded cuda 7 sdk rc… however I did take a look at the online cuda 7 sdk doc… but nothing in there mentioned any of this new 3.7 compute capability.
I was kinda wondering if the online doc is a “release/oroduction” version or also a “release candidate”.
I guess it’s only a “release/production doc” (?) so I guess that answers my curiosity about that.
(Could be nice to have a release candidate online documentation version as well (safes me from having to download and install a new sdk (800 MB for the SDK is quite large when running low on free space or so… I know we/I get terrabyte harddisks/drives but still ;)). The online version could safe some time too… so I do hope to see a release candidate documentation version some time in future or so (?) ;)
Actually I have another question on my mind ? Are we even allowed to discuss release candidates on this forum ? Perhaps this violates Non-Disclosure agreements ?!?
I read agreement that has to be signed… I think this may violate it. But I can understand that developers using the preview SDKs may have questions and want to ask nvidia about that.
So maybe this forum would need a private section (closed to the public) so that registered developers can ask “private” / “non-public” questions there.
GK210 whitepaper confirms GK210 is Compute Capability 3.7 and since this is available for anyone to download, it’s not under any NDA.