Profiling Taces at block-level

Hello,

Will it be possible that profiler can provide block traces in current or future version? The purpose of this is that instead of using a simulator, we can have a realistic heat map of each SMX usage and improve the scheduling policies if possible to achieve better data locality or energy efficiency.

Best,