I’m new to cuda programming and want to run an algorithm until a certain value is reached. The problem is that I don’t want to transfer the data back to the CPU after every cycle to decide if another cycle is necessary and my GPU doesn’t support dynamic parallelism. Is it possible to return a single value from a cuda kernel? Is there a way to call the kernel recursively and is it possible to call cublas functions from inside a kernel without dynamic parallelism?
I’m grateful for every kind of help.