Current best practices for performing sets of simple operations


I’m relatively new to CUDA and I have a question about current best practices for performing calculations. In particular about how to make good decisions between simplifying problems with more variables or performing more complex calculations with fewer variables.

For example:

Each thread is asked to compute one result.


float ua = ((x2-x1) * (ky1-y2) - (y2-y1) * (kx1-x2)) / ((y2-y1) * (kx2-kx1) - (x2-x1) * (ky2-ky1));

The question is:

To what degree do people believe a calculation should be broken down into simpler chunks?

Just write code that makes sense to you and is something you can understand and want to maintain.

The compiler is frequently/usually better than you when it comes to worrying about performance optimization of such small pieces of code.

If anything, I’d just say, prefer usage of things like thrust::device_vector over things like raw cudaMalloc calls. In-kernel complexity is relatively minor compared to all the C API noise that oftentimes lives in the host code.