To solve this. Besides local memory (and global memory) one can a) use shared memory b) unroll the loops c) use conditional code to statically reorder registers
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Examining the generated .ptx file | 13 | 2452 | October 24, 2014 | |
| How to optimize my cuda code? | 14 | 2074 | June 28, 2023 | |
| switch construction from C compiled to sequential if elseif elseif ... | 11 | 8203 | August 1, 2008 | |
| On the register allocation optimization of cuda compiler | 12 | 3322 | January 20, 2019 | |
| Cuda compiler loop unroll bug? | 14 | 2486 | October 25, 2017 | |
| "Unnecessary" synchronization required, but only on some cards? | 13 | 1635 | July 25, 2016 | |
| Is it dangerous to mix warp shuffles with bitwise or logical operators in same instruction? | 18 | 95 | March 2, 2025 | |
| preventing ptxas from reordering instructions | 23 | 6190 | December 2, 2022 | |
| Question about : Kernel optimization , ptaxs register usage, branch divergence, warm up kernel runs | 5 | 346 | May 7, 2024 | |
| Why compiler don't use registers to store my data? | 43 | 154 | December 7, 2024 |