CUDA concurrentKernel example in TX1 don't run

oh… what is wrong in my tx1 board…

when I run 6_Advanced/concurrentKernels this show result like this…

GPU Device 0: “NVIDIA Tegra X1” with compute capability 5.3

Detected Compute SM 5.3 hardware with 2 multi-processors
Expected time for serial execution of 8 kernels = 0.080s
Expected time for concurrent execution of 8 kernels = 0.010s
Measured time for sample = 0.000s
Test passed

Through the profiler, I understand this result means nothing is excuted in GPU.
because, I can see the no stream is run in the picture.

so, I’m curious that whether TX1 support concurrent CUDA kernel excution or It doesn’t support at all/

I’ll really wait for answer.

Hi,

Thanks for your question.

TX1 do support concurrent kernel.
If not, you should get ‘GPU does not support concurrent kernel execution’ when executing.

In your profiling data, kernel is indeed executed in different stream.
Please check the right most bar although it’s really small. Since the kernel code is only loop 720 times.

Thanks.

so this is the cuda stream part of profiling…
what do you think?

so this is the cuda stream part of profiling…
what do you think?

Hi,

Please attach file directly for better visualization.

In my profiling data, concurrentKernels sample do launch kernel over 8 cuda stream.
You can click the bar for execution details:

Ex.
Start 183.595 ms
End 183.619 ms
Duration 23.959 us
Stream Stream 15