How to enable/disable tcgen05.ld.red reduce unit?

For research purposes, I need to compare program performance with and without the tcgen05.ld.red reduce unit. But

My questions:

  1. Is there any mechanism can disable the reduce unit?

  2. Is this a correct solution: modifying compiled SASS to replace tcgen05.ld.red with tcgen05.ld? I found NVBit and CuAssembler mentioned in research papers - are these suitable?

I am new to CUDA low-level programming, so any guidance on the easiest approach would be appreciated.

Thank you!!