Hi,
I implemented streams in my CUDA script as shown.
PT1<<<gride, blocke>>>(dvxdx, dvydy, dvxdy, dvydx, d_vx, d_vy, d_alpha, d_beta, d_index,nbe);
cudaDeviceSynchronize();
PT1_Etanbe<<<gride, blocke, 0, stream1>>>(Eta_nbe, d_etan, d_areas, nbe);
PT1_x<<<gride, blocke, 0, stream2>>>(dvxdx, dvydy, dvxdy, dvydx, d_vx, d_vy, d_alpha, d_beta, d_index, kvx, d_etan, d_Helem, d_areas, d_isice, nbe);
PT1_y<<<gride, blocke, 0, stream3>>>(dvxdx, dvydy, dvxdy, dvydx, d_vx, d_vy, d_alpha, d_beta, d_index, kvy, d_etan, d_Helem, d_areas, d_isice, nbe);
I am looking to run the kernels in streams 1, 2 and 3 simultaneously. The qdrep file
shows that the kernels in those streams don’t begin at the same time and there is not much overlap in time. What am I missing?Thanks for any information you can provide,
Anjali