Launching multiple compute shaders with dependencies

I’m trying to wrap my head around some of the compute shaders features in Vulkan.

I’m very familiar with CUDA, so that’s how I try to understand it.

So, are queues/command buffers equivalent to streams?

For example, if I have four kernels, A, B, C, and D, and the first three can run in parallel and D requires the outputs of the first three. In CUDA, I would do (in bad pseudo code):

create streamA, streamB, streamC, streamD;
create eventA, eventB, eventC;

launch(A, streamA);
record(eventA, streamA);
launch(B, streamB);
record(eventB, streamB);
launch(C, streamC);
record(eventC, streamC);

waitEvent(eventA, streamD);
waitEvent(eventB, streamD);
waitEvent(eventC, streamD);
launch(D, streamD);

destroy all events;
destroy all streams;

But for Vulkan, if I have an equivalent of these kernels as compute shaders, how would I launch them?

So I need to launch shaders A, B, and C. And they all produce an output that is then used by shader D.

To get such result, can I put each compute shader (A, B, and C) in a different command buffer and then place a fence after the their respective dispatch. And in D’s command buffer, wait for all the fences and then dispatch it.

Then I query each of these command buffers in a queue? Or should I use a different queue for each of them to be sure that A, B, and C are executed in parallel on the GPU?

I’m just looking for the right strategy here to get a similar effect that I have with the CUDA example above.