I am new here.
I am writing some code on concurrent kernel.
My card is GTX570 running on a WinXp with CUDA3.2.
I run two kernels in two streams.
But there is only less than 10% speedup.
I saw these words in the Programming Guide
"A kernel from one CUDA context cannot execute concurrently with a kernel from another CUDA context.
Kernels that use many textures or a large amount of local memory are less likely to execute concurrently with other kernels."
What is the max number of the textures and the local memory?
Is anybody can give me some advice?
Thank you in advance.