I am working on speeding up some encryption algorithms, I observed that for plaintext sizes below 1MB, the combination of 32 byte size blocks encrypted by 32 threads per block is the optimal combination to give least possible kernel launch time and for plaintext sizes greater than 1MB, 64 threads launched per block, each thread dealing with 128 byte block forms the optimal combination. Why does this happen?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Ideal number of thread per bloc | 9 | 3409 | February 5, 2008 | |
Number of Threads | 0 | 3035 | August 15, 2010 | |
kernel performance and number of threads | 2 | 6593 | November 22, 2007 | |
How to decide the optimal block size in CUDA | 4 | 27472 | February 15, 2010 | |
finding the best number of threads per block | 3 | 7836 | January 29, 2010 | |
General Formula for Thread/Block Ratio | 1 | 587 | June 2, 2011 | |
efficiency of block/thread ratios | 2 | 3817 | April 18, 2007 | |
Number of thread blocks and threads in those, difference for performance? | 1 | 380 | September 6, 2021 | |
What is the maximum number of threads per block? | 4 | 21239 | April 8, 2010 | |
ideal number of tread per block | 10 | 2959 | March 25, 2010 |