I am trying to use CUDA streams on Jetson TK1 for concurrent kernel execution. But according to the profiler, CUDA streams seems to be executing in a serial fashion. I found a post online about some software issues with Jeston Tk1 preventing from using CUDA streams. I am using CUDA version 6.5.
Has anyone tired CUDA streams on TK1? Could you please point or share a working example?
Thank you in advance.