Hi,
I am trying to use CUDA streams on Jetson TK1 for concurrent kernel execution. But according to the profiler, CUDA streams seems to be executing in a serial fashion. I found a post online about some software issues with Jeston Tk1 preventing from using CUDA streams. I am using CUDA version 6.5.
http://devblogs.nvidia.com/parallelforall/jetson-tk1-mobile-embedded-supercomputer-cuda-everywhere/
Has anyone tired CUDA streams on TK1? Could you please point or share a working example?
Thank you in advance.
regards,
Barath Ramesh