Using store_matrix_sync with SMEM: bank conflict?

  • Can the target address of store_matrix_sync be Shared Memory (SMEM), or does it have to be Global Memory?
  • If the target address is SMEM, is there a possibility of bank conflicts?
  • Does store_matrix_sync have built-in support for swizzling to avoid bank conflicts, or does it need to be manually implemented?

from the programming guide:

individual matrix elements must be accessed from memory (shared or global) after calling store_matrix_sync .

Since the pattern is unspecified (again, quoting from the programming guide, please read the entire section I linked) its not really sensible to answer that question, in my opinion. At least, I wouldn’t be able to answer it. (Even if I could offer an answer, it might vary by CUDA version, or by GPU architecture, or perhaps other unknown factors.) Or if you prefer, the answer is “yes”. You could try to use nsight compute to test a particular case, if that is of interest.

It’s unspecified, and AFAIK there would be no way for you to manually implement it, since the function is intentionally (by design, and by specification) opaque in its behavior.

There is the option to switch to a PTX mma op, which could expose the behavior. There are numerous forum posts with examples.

1 Like

Very good answer! Very helpful, thanks!!!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.