avidday
September 19, 2010, 9:49am
21
That is incorrect. It is very possible to overlap CPU and GPU computation within a single thread - all my linear algebra codes do this as a basic design tenet. Cuda has been fully asynchronous since 1.0 was released three years ago. I am not sure what you are doing wrong (and whether this is actually an instrumentation/measurement problem), but be assured that you are doing something wrong.
That is incorrect. It is very possible to overlap CPU and GPU computation within a single thread - all my linear algebra codes do this as a basic design tenet. Cuda has been fully asynchronous since 1.0 was released three years ago. I am not sure what you are doing wrong (and whether this is actually an instrumentation/measurement problem), but be assured that you are doing something wrong.
Yay! Thank you! That’s what I was looking for.
That is incorrect. It is very possible to overlap CPU and GPU computation within a single thread - all my linear algebra codes do this as a basic design tenet. Cuda has been fully asynchronous since 1.0 was released three years ago. I am not sure what you are doing wrong (and whether this is actually an instrumentation/measurement problem), but be assured that you are doing something wrong.
Yay! Thank you! That’s what I was looking for.
mayouuu
September 20, 2010, 10:42am
24
I see.
In my case, I need the results of the current iteration before I can schedule the next.
But either way, my problem is that the GPU seems not to progress until I call cudaThreadSynchronize(). I expected that would be the case with eventSync as well, but I guess anything is worth a try.
At this point it seems to me that the only way to get the CPU and GPU going in parallel is to create a thread responsible for launching and managing GPU operations (and of course setting synchronous operations to wait by yielding periodically). That’s not unreasonable, but I don’t see what use the “asynchronous” cuda APIs are if the GPU stalls waiting on the CPU at some unknown point.
Maybe you can try to use CUDA Streams and AsyncMemCopies ? I think they are ment for that exactly. Am I worng?
mayouuu
September 20, 2010, 10:42am
25
I see.
In my case, I need the results of the current iteration before I can schedule the next.
But either way, my problem is that the GPU seems not to progress until I call cudaThreadSynchronize(). I expected that would be the case with eventSync as well, but I guess anything is worth a try.
At this point it seems to me that the only way to get the CPU and GPU going in parallel is to create a thread responsible for launching and managing GPU operations (and of course setting synchronous operations to wait by yielding periodically). That’s not unreasonable, but I don’t see what use the “asynchronous” cuda APIs are if the GPU stalls waiting on the CPU at some unknown point.
Maybe you can try to use CUDA Streams and AsyncMemCopies ? I think they are ment for that exactly. Am I worng?
Sarnath
September 22, 2010, 9:18am
26
You need to set the “device options” using some CUDA API call… YOu can control the behaviour of “cudaThreadSynchronize” using that call…
I fail to remember the API though… cudaSetDeviceFlags() or cudaSetDeviceOptions()??? something… check the manual.
OR
You can run another desktop close to your GPU machine… The CPU on the other machine and GPU in your machine would run in parallel…
Hayyoo… Hayyoo…
Sarnath
September 22, 2010, 9:18am
27
You need to set the “device options” using some CUDA API call… YOu can control the behaviour of “cudaThreadSynchronize” using that call…
I fail to remember the API though… cudaSetDeviceFlags() or cudaSetDeviceOptions()??? something… check the manual.
OR
You can run another desktop close to your GPU machine… The CPU on the other machine and GPU in your machine would run in parallel…
Hayyoo… Hayyoo…
You need to set the “device options” using some CUDA API call… YOu can control the behaviour of “cudaThreadSynchronize” using that call…
I fail to remember the API though… cudaSetDeviceFlags() or cudaSetDeviceOptions()??? something… check the manual.
Hi, thanks for the suggestion, but those options control how cudaThreadSynchronize() waits. In my case it shouldn’t be waiting for anything at all.
I think that was a joke… I hope that was a joke. :)
You need to set the “device options” using some CUDA API call… YOu can control the behaviour of “cudaThreadSynchronize” using that call…
I fail to remember the API though… cudaSetDeviceFlags() or cudaSetDeviceOptions()??? something… check the manual.
Hi, thanks for the suggestion, but those options control how cudaThreadSynchronize() waits. In my case it shouldn’t be waiting for anything at all.
I think that was a joke… I hope that was a joke. :)
Have you tried running it with the NSight development addon? It would be interesting to see how this showed up there
Have you tried running it with the NSight development addon? It would be interesting to see how this showed up there