I have a kernel that is dominated by compute (high arithmetic intensity around 60 flops per byte). The bottleneck pipe is XU. I have already switched to single precision cuda intrinsics. Any way for further sped up?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Could you suggest some ideas to improve my kernel's performance? | 3 | 39 | September 23, 2024 | |
Memory intensive CUDA benchmarks | 0 | 494 | June 12, 2018 | |
questionable cuda kernel performance | 0 | 376 | January 31, 2017 | |
Nvidia Flex - CUDA performance | 1 | 421 | May 31, 2018 | |
GTX 1080 - Cuda core architecture | 2 | 906 | July 9, 2019 | |
Cuda Kernels running slow | 0 | 477 | November 9, 2018 | |
help convert to a kernel pleeeeease | 7 | 2412 | September 24, 2008 | |
Techniques for Kernel Optimization | 1 | 5729 | July 29, 2010 | |
optimazing the programming of cuda | 0 | 4364 | May 30, 2010 | |
Asking for help to think of future university work's subject | 1 | 2455 | December 12, 2011 |