Can threads in a block execute different device functions in parallel based on a function pointer lo

I need to execute a large sum of products function which has different data for each thread.

The easiest way to code this is for each thread to load/execute a different function pointer, but I fear that CUDA is not flexible enough for different threads to execute different code in parallel.

If not, I will have to figure out a data indexing scheme which accommodates various different signs (+.-) applied to the data.

Can you group execution such that all threads in each warp execute the same code? If so, you should be fine.

If there is divergent control flow within warps, performance will suffer since each separate flow will have to execute consecutively between the point of divergence and the next convergence point. In the worst case, that is 32 different control flows, execution will be completely serialized and performance will plummet accordingly.

How the divergent flows come about does not matter, it could be due to branches of different function pointers.

There are no different branches (if I understand you correctly). Each function in the pointer table executes the same sum of products with different data (indexed) and different signs.

You wrote of multiple functions invoked through a function pointer. Unless that pointer takes the same value for all threads in a warp, code execution will in fact follow different code paths, even if all functions should have identical functionality.

If you can massage your function evaluation into a single code path that just uses different data (e.g. coefficients in a polynomial) across the threads in a warp, there will not be any divergence. But this approach may give rise to inefficiencies in data access, e.g. serialization when accessing constant memory.

If you could show a concise example of what you are doing (or planning to do), instead of describing it in very generic terms, it might be easier to discuss.

I’ve decided that function pointers will not work (as I suspected) based on your confirmation that the code must be identical.

Instead I will just expand the sum of products indices to include an additional product term index that will point to either -1 or 1 to accommodate the various signs.