Parallel Nsight currently does not support tracing to show concurrent kernels
on Fermi GPU’s
Any suggestions on how to show/measure/calculate how kernels are scheduled
concurrently in a simple cuda script in the meantime until Nsight is updated.
I can measure event times of kernels in each stream fine using
cudaEventRecord/cudaEventSynchronize/cudaEventElapsedTime but how to
show they are concurrent running I’m not quite sure.
Any help appreciated