In the programming guide[Steams].it says:
Each stream copies its portion of input array hostPtr to array inputDevPtr in
device memory, processes inputDevPtr on the device by calling myKernel(), and
copies the result outputDevPtr back to the same portion of hostPtr. Processing
hostPtr using two streams allows for the memory copies of one stream to overlap
with the kernel execution of the other stream.
Its for two task in two streams.
Are there any method to make copy and kernel launch in only one stream be overlapped.
What I mean is only one stream. and copy only portion (or currently requied) data for kernel launch which can make the tow job be overlaped?