Tensor cores in Volta GPU is a really big step (i.e. 120 Tf). However, there is no way to test/tune performance for many developers. V100 is too expensive.
Does NVDIA have plans to produce cheap version of V100? Any ideas?
For example, smaller chip (1/2 of V100) with only 320 tensor cores (400m2 on 16nm) using cheap GDDR5X (e.g. 547 GB/s like Titan Xp) and with price in range $2000-$3000 would be really great buying option!
Any ideas where to test/tune performance of tensor cores are really welcome. Cuda9 has support for tensor cores but there is no way to test/tune performance and to write optimal code using tensor cores :(.