Hello, I am converting a JAVA simulation program to CUDA for a friend. He has 11 nested for loops, and I’m trying to get as much performance as possible from it. I’m am planning on using a grid-stride indexing system and I would appreciate any snippets and suggestions maximizing outputs. Currently, there is 1.3 quadrillion combinations I have to iterate through, testing each one, Each combination taking about ~64 bytes of local storage. I’m not to great at CUDA yet, so sorry if their is a obvious solution.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Handle Nested Loop With Variable Loop Bounds | 0 | 456 | July 15, 2020 | |
Question about nested for-loop, and how it works | 2 | 1568 | September 28, 2020 | |
Optimizing 2-D CUDA code | 0 | 6612 | July 9, 2009 | |
Help for a simple testing problem | 3 | 4795 | July 30, 2007 | |
Performance increase? Too good to be true? | 1 | 2079 | January 4, 2009 | |
Parallelizing for loops using CUDA | 3 | 2570 | March 8, 2012 | |
Nested "for" in a device function | 3 | 801 | May 23, 2015 | |
Designing a CUDA algo question Sort of a newbie question.... | 2 | 2373 | December 9, 2011 | |
nested Loops Best way to CUDA program Nested Loops | 1 | 5012 | November 23, 2009 | |
3D Block and Grid | 1 | 1808 | April 25, 2012 |