CUDA Visual Profiler: Not showing overlapping memory copies

This question has come up before on these forums but was unanswered:

The CUDA visual profiler does not show overlapping memory transfers and kernel executions in the GPU Time Width Plot. I have seen this in my own code and when running the simplestreams example in the SDK.

Is this by design or a bug?

How is one to determine how well they are overlapping memory transfers and computation?

Furthermore, in the visual profiler version 2.3, it shows all memory transfers in stream 0, which is just incorrect. Again this has been observed both in my own code and the simplestreams SDK example.



As a side note, in my opinion it would be great if a lot more of the experts here would participate in sites designed for QA such as stackoverflow. It is too easy for questions to slip under the radar and go unanswered in the forum format.