Thanks for your reply.
Here are still 2 questions.
- What you mean is that Thor doesn’t support MXFP4?
- I am reading the PTX docs and the command tcgen05. mma. cta_group. kind. block_stcale {. scale-vectorsize} indicates that . scale-vectorsize can only be used with sm_100a, sm_100f, and sm110f, but thor is sm_110a. But when the data type is . kind: mxf4nvf4, K is at least 64. I want to confirm if the . scale-vectorsize parameter is available on Thor? Thank you again.
here is the docs:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#tcgen05-mma-instructions-mma