Efficient use of TensorCore and cudaCore data on Xavier platform, not limited to API

How to efficient use of TensorCore and cudaCore data on Xavier platform, not limited to API

Hi,

Here is a tutorial for your reference:
https://devblogs.nvidia.com/tensor-core-programming-using-cuda-fortran/

Thanks.

this is generic approaches, the is any specialized optimizations?

Hi denvend,

Not sure what the “specialized optimizations” mean, please elaborate more of your implementation to see if any specific suggestion can be provided.