How can I detect within CUDA (with CUDA runtime or CUDNN library) whether a device has tensor cores ?
I checked the ‘cudaGetDeviceProperties’ function, it does not have such flag.
Furthermore, I suppose I cannot detect it from the compute capability, as I suppose the smaller Turing GPUs might not have TensorCores (or not enough of them in order to be useful for accelerating the convolution in CUDNN)
the compute capability basically specifies which instruction set your GPU can run. So far we know 7.0 (Volta) and 7.5 (Turing) both support tensor cores in their instruction set.
I would therefore expect that any future Turing SKU (even the lowest level ones) using compute capability 7.5 to support Tensor arithmetics - maybe at a very much throttled throughput.
If nVidia decide to add new compute capabilities such as 7.6 for future Turing based low end GPUs, this might indicate they’ve cut the Tensor cores out to save cost. But honestly I think this won’t happen.
Thx. the question is for me something like that: Is it worth (means: will the network run faster) for a specific GPU to enable CUDNN operations using TensorCores, or not ?