Overhead of cudaStreamQuery

chlskawo12 · November 22, 2016, 4:50am

Hello.

How much is the overhead of cudaStreamQuery?

If I call it several hundred times, will that hurt CPU and GPU performance significantly?
[/u]

MatColgrove · November 22, 2016, 5:34pm

Hi CNJ,

Sorry, I don’t know the overhead of cudaStreamQuery. Though assuming you’re using it to poll if an async call is finished, I wouldn’t think the overhead would matter much since the CPU is waiting on the GPU.

Have you tried profiling your code to see the impact?

Mat

chlskawo12 · November 23, 2016, 6:01am

Not yet.

I’m currently devising an algorithm for concurrent calculation.

It takes several days to implement it.

It balances loads automatically based on host and device runtime, so cudaStreamQuery should be called frequently from host side to measure exact kernel runtime.

Otherwise I will be able to measure kernel runtime only after the host side is finished. In this case I can’t know whether the kernel has finished earlier or not. (To my knowledge, cudaElapsedTime only measures the time lapse between cudaEventCreate and cudaElapsedTime call, isn’t it?)

That’s why I’m concerned about the overhead of cudaStreamQuery.

Based on your opinion, I think I can proceed with my algorithm.

Thank you.