A dev blog article
https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/
nVidia Turing White Paper
Various tech news sites have also begun to report on some aspects of this newly released information.
Interesting tidbits from the release are:
- We get 96kb Shared/L1 per multiprocessor (shared/L1 split configurable as 32/64 or 64/32)
- FP and integer instructions can be executed in parallel in Turing (supposedly this was not possible in Pascal - which confirms my findings that I was unable to get DP4A/DP2A instructions dual issued with floating point operations. Back then I simply did not know it was impossible).
- Turing's Tensor cores now support INT8 and INT4 maths at even higher throughputs.
- No word about user programmability of the RT cores.
- OMG - that beautiful die shot in the whitepaper. Enhance! Enhance!