Hi all.
I have CUDA app here, which do run in a switch(…) statement most of its time. Values there are from 0 to 25 (no “wholes” between).
When I try to use array of function ptrs in stead of the switch (with CUDA 5.5, since nvcc from CUDA 5.0 used to crash during compilation), this results in slower app (about 5-8% slower).
Is this normal ? I believe the compiler doesn’t makes jump table from a switch (like a regular cpp compiler might do), so my manually written jump table should be faster …
Also, I can’t run the profiler (as you can see here https://devtalk.nvidia.com/default/topic/545731/visual-profiler/visual-profiler-5-5-no-timeline-no-resuts-no-errors/) …
Any clues ?