I have not seen a lot of threads on this forum addressing specific issues with Volta V100 (compute_7.0) programming, optimizing for Volta or porting an existing CUDA code base to Volta.
This may be because the barrier of entry (the price) definitely excludes the hobbyist or enthusiast users - and maybe because those working on projects with Volta cannot disclose what they are working on.
Would anyone be willing to share experiences, stories of success or failure or simple programming tips regarding Volta?
(I am currently checking whether the tensor cores can be shoehorned into performing bigint math. I will contribute as soon as I know more…)
Volta instances are available in the public cloud (AWS) for as low as ~$3/hr spot market.
Ah, the joy of having your spot instance pulled out from under your feet because someone just put in a higher bid ;)
I am a bit miffed that there is no emulation of the warp matrix multiply and accumulate features for older devices. With such an emulation we could spend time developing and verifying code on Pascal, and only pay for the spot instance to drive test and benchmark our mostly completed code.
No experience to share, but I am also curious about the internal precision and rounding behavior of the tensor cores. The best I could find in the documentation is that “accumulation of the intermediate values is performed with at least single precision”. Which does not mean much when the evaluation order is unspecified… I suspect it might be using some wide fixed-point or block floating-point format internally. If so, that may make it quite interesting for bigint math indeed.
If you happen to find anything about the actual internal number representation and precision in your experiments, please let us know!