I am able to do this sequentially,
CPU capture frame t0~t4–> GPU compute frame t0~t4–> CPU display t0~t4–> CPU capture frame t5~t9–> GPU compute frame t5~t9–> CPU display t5~t9–>…
There will be relatively large time difference between t4 and t5, and frame t0~t4 is very identical to each other. I think CPU and GPU should be able to work together technically, but I somehow can’t figure it out. Any tips?
I have a few kernel calls in my compute.cu, and they have to be launched in order. Does that matters?
Do you mean the frame capture code also has to be written in a cu file? Because I am using OpenCV functions to capture images from webcam, and it won’t compile on nvcc. If so, how do you solve this problem?
It does not matter. kernell <<<>>> returns just after launch while gpu is still working. However, on win7 gpu calls are batched, so need additional tricks.
This came up in another thread. If the documentation is going to claim that kernel calls are asynchronous when they aren’t in Windows because of batching, that should be explained and the workaround given.