Thread Completion

Is there a way to execute a function or print something to the terminal once a thread has completed?

Sure, just wrap your kernel inside another function.

__global__ mykernel(...)

{

	myRealKernel(...);

	MyCleanupFuntionToCallForAllThreadsAsTheyExit(...);

}

But if I were to run that I still could not have a print function in the MyCleanupFuntionToCallForAllThreadsAsTheyExit function because it would need to be a device function. Is there anyway to have a functino be called that is not a device function after each thread is complete?

If you want a print function, you could try using the GPU Trace library