Processing multiple graphs

I have multiple graphs of varying number of vertices and edges. I want to process each graph on the GPU. Each graph can be processed completely independently. My goal is to minimize the time taken to complete this computation on all graphs. What is a good strategy for when-

  1. All the graphs can fit into GPU memory at the same time.
  2. Each graph individually can fit into GPU memory but not all at once.

In such situations is it advisable to launch a separate kernel on a separate stream for each graph? How do I choose the number of blocks/threads for each?