CUDA graphs use streams to arrange for concurrency and asynchrony. You can control dependencies. This control is most obvious if you use the API capture method, but if you use the stream capture method, the dependencies will still be defined at that point.
No graph item will execute before its dependencies are complete. Other than that, CUDA graphs will attempt to schedule work efficiently to maximize performance, and you have no direct control over this scheduling.
Let’s say we have a graph item B that is dependent on A, and a graph item C that is also dependent on A. CUDA graphs will use streams (generally speaking) to allow both B and C to execute as quickly as possible, after A is complete.
Regarding your question 2, you don’t have control over the detailed scheduling of activity, other than declaring dependencies.