I have a data graph and 1000 query graphs.If I put all queries graphs together in a list and run them in GPU sequentially,almost all queries take longer time than they use when they are executed seperately.For example, query No. 500 may take 40ms when it’s executed seperately, but 500ms when it’s executed together with other queries.
here is the pseudocode when all queries are put together:
for queryGraph in queryList{
device_func(dataGraph,queryGraph)
cudaDeviceSynchronize()
}
so, is there any features of GPU may lead this result?