I see, I misunderstood your question. In this case, the fused kernel without this optimization is also around 15% slower than with it.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
nvcc/ptxas unnecessary lmem loads/stores Bug in nvcc alias analysis/PRE stages | 19 | 11922 | June 28, 2010 | |
Suddenly performance lost | 22 | 7543 | November 22, 2007 | |
How to prevent nvcc from using local memory? | 16 | 22437 | February 14, 2008 | |
Getting nvcc to consolidate registers | 19 | 19538 | November 19, 2012 | |
How to force nvcc to use registers instead of shared memory? Need help to understand compiler option | 2 | 1058 | March 16, 2009 | |
Coalesced Memory Access to Structs | 11 | 4653 | September 19, 2009 | |
NVCC compiler flag for optimizing C code | 1 | 9708 | October 4, 2010 | |
Possible CUDA improvements | 7 | 6139 | July 14, 2008 | |
BUG? nvcc compiler unnecessary splits 8 bytes into 2 4 byte ones | 5 | 2123 | May 25, 2009 | |
Inconsistancy between NVCC and MS-Compiler | 5 | 6325 | December 10, 2010 |