CUDA for beginners, lesson #1

In order to promote the use of CUDA for more than machine learning and image processing I am starting a series of blogs showing how to convert well known algorithms to CUDA CPU/GPU hybrid implementations.

The first in the series, the discrete knapsack problem which returns all items used to generate the optimal result;

Not the most elegant implementation but it is a good starting point for those new to CUDA. On a well configured Titan X over a 25 time speedup over a serial CPU single threaded 4.5 GHz implementation.
This is not an ‘embarrassingly parallel’ algorithm and it is interesting to determine how to break the problem down into portions which can be mapped to a GPU.

A similar type of problem, but a bit more complicated and the hybrid implementation maps better to the GPU which results in about a 50-80 time performance difference over serial CPU implementation;