Classes work with CUDA - you just have to define the methods with the “device” keyword. However, you’ll have to write the entire contents of the class from scratch - libraries such as boost aren’t going to work (in device code - the host code can still use Boost just fine). You may also need to rethink your parallelism. From your description, it’s impossible to be sure, but your current implementation is probably far too coarse to work on the GPU. GPU threads are much lighter weight than CPU threads - on a CPU, it would be insane to launch a thread to add a pair of numbers together; on a GPU, that’s The Way It’s Done.