Forgive me if this question has already been answered, but I searched the posts and did not find anything. If you have the general pseudo code of
Will the device searialize these calls? I know kernel calls are sequential (at least until Fermi), but in the above example, will the memcopy complete before kernel2 executes? I have many of these operations and I don’t want the host to waste time waiting for blocking calls to return, if I can help it. If the above is serialized, is there a practical limit to the number one could call befor filling up the queue on the device? Is that documented as part of the device or is it a driver issue that one could characterize?
Thanks in advance for your help.