I’m trying to work on the CPU with the data generated by the kernel… that is, after copying data from the device to the host, and resend the kernel to generate more data…
This process is on a loop.
This is the algorithm but it doesn’t seem to work…
cudaMallocs() // asking for memory
cudaDeviceSynchronize() // making sure everything is synchronized
cudaEventRecord(start) // for taking time purposes
call_async_Kernel(b, t, 0, 0)
while (cudaEventQuery(stop) == cudaErrorNotReady)
Maybe this is not the proper way to do it, any ideas?
Thanks in advance!