[SOLVED]-Profiler- "insufficient kernel bound" when trying to analyse "kernel perform

hi,

i have a working kernel, im trying to analyse what effect the performance
when profiling i receive the following:

  • [b]Insufficient Kernel Bound [/b]The data needed to perform memory bandwidth analysis for the kernel could not be located

had internal bug in memory allocation.

i work with toolkit 6.5 on Kepler gpu

[s]found a partial solution,
added cudaProfileStart +cudaProfileStop to reduce the tested iterations.

after some code modification and improved performance the error returns.
seems it fail in cudaMemcpyAsync with invalid argument error,
but the code is stable and run well in release and debug mode
[/s]