Overhead of cudaStreamQuery


How much is the overhead of cudaStreamQuery?

If I call it several hundred times, will that hurt CPU and GPU performance significantly?


Sorry, I don’t know the overhead of cudaStreamQuery. Though assuming you’re using it to poll if an async call is finished, I wouldn’t think the overhead would matter much since the CPU is waiting on the GPU.

Have you tried profiling your code to see the impact?

  • Mat

Not yet.

I’m currently devising an algorithm for concurrent calculation.

It takes several days to implement it.

It balances loads automatically based on host and device runtime, so cudaStreamQuery should be called frequently from host side to measure exact kernel runtime.

Otherwise I will be able to measure kernel runtime only after the host side is finished. In this case I can’t know whether the kernel has finished earlier or not. (To my knowledge, cudaElapsedTime only measures the time lapse between cudaEventCreate and cudaElapsedTime call, isn’t it?)

That’s why I’m concerned about the overhead of cudaStreamQuery.

Based on your opinion, I think I can proceed with my algorithm.

Thank you.