Differences and Compatibility Between mbarrier and barrier in PTX

Hi everyone,

I’m wondering if there’s any significant difference between mbarrier and barrier in PTX? They seem interchangeable to me—are they actually mixable?

Also, according to the PTX documentation, mbarrier and bulk_group appear to be distinct concepts. However, I noticed that using bulk_group followed by an arrive on an mbarrier still works as expected. Does this mean these two can also be used together without issues?

Looking forward to your insights!

They are certainly different and not mixable. mbarrier uses shared memory to store state while barrier uses “barrier resources” which are much more limited in number (but maybe faster to access). But there are also differences in how they can be used. E.g. one can arrive at an mbarrier, do some other work and later wait on it. With barrier one can only arrive or wait (sync), so you can’t use it to hide waiting time. barrier also seems to work at warp level and not at thread level. There are many more differences if you read through the docs.

1 Like