For research purposes, I need to compare program performance with and without the tcgen05.ld.red reduce unit. But
My questions:
-
Is there any mechanism can disable the reduce unit?
-
Is this a correct solution: modifying compiled SASS to replace
tcgen05.ld.redwithtcgen05.ld? I found NVBit and CuAssembler mentioned in research papers - are these suitable?
I am new to CUDA low-level programming, so any guidance on the easiest approach would be appreciated.
Thank you!!