What's the PTX code for BAR.SYNC.DEFER_BLOCKING?

I’d like to implement the BAR.SYNC.DEFER_BLOCKING using PTX code. Is this SASS code more advanced than BAR.SYNC while async-loading.

presumably you have some C++ code that is generating that SASS instruction? If so, why not just compile to PTX instead of SASS?

I can only generate BAR.SYNC. While profiling cuBLAS’s kernels, I saw this SASS code instead of BAR.SYNC. So I guess that this code might have higher performance.

The naming of instructions is not always consistent between the PTX layer and the SASS layer. Have you tried PTX’s bar.arrive to see whether it maps to this SASS instruction? My (limited) understanding is that bar.arrive would always be used in conjunction with a bar.sync elsewhere in the code, so might be considered “deferred blocking”.

There are also SASS instructions that are simply not accessible from PTX, e.g. FCHK ( single-precision floating-point division range check). They are either generated by the compiler internally, or require SASS-level programming. At present, NVIDIA does not make tools for SASS-level programming available to the public. Nor has it done so at any time in the past.

Thanks for your information about SASS instructions! I will try bar.arrive.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.