Potentially both the ptx file and the ptxas output could be correct if ptxas has optimized away the local memory allocation. I don’t know however if it does optimizations on as high a level as that.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Uint64_t result evaluation & storage eats up 25% of kernel performance | 28 | 1090 | October 3, 2023 | |
| Local memory slowing down program even though it's not being used! | 1 | 758 | March 13, 2009 | |
| always-false if branch affects program performance | 5 | 945 | July 8, 2014 | |
| Dummy operation improoves performance | 4 | 1568 | April 14, 2009 | |
| Performance opposite of expected | 8 | 583 | March 10, 2022 | |
| Local Loads and Stores in CUDA profiler | 4 | 1612 | September 3, 2010 | |
| Too big delay in code, problem | 3 | 914 | October 22, 2009 | |
| Weird performance decrease | 2 | 5496 | November 30, 2009 | |
| Strange Performance issue | 6 | 608 | February 17, 2020 | |
| Compiler generated code for constant memory access - a question | 5 | 6667 | June 7, 2010 |