cudaMemcpyAsync decrease the data transfer performance?

fishbupt · February 1, 2010, 3:01am

Hi, all

In my project, i want to use overlap data transfer and kernel launch to boost the App performance.

But, when using cudaMemcpyAsync whit streamId not equal to 0, the data transfer between host and device decreased.

here is my source code

[codebox] for(int offset = 0; offset < iqSize; offset += fftSize*nStream)

        {

			for(int j = 0; j < nStream; j++)

				CUDA_SAFE_CALL(cudaMemcpyAsync(d_iq[j], iq, sizeof(Complex)*fftSize, cudaMemcpyHostToDevice, stream[j]));

			for (int j = 0; j < nStream; j++)

				CUDA_SAFE_CALL(cudaMemcpyAsync(spectrum, d_spectrum[j], sizeof(Complex)*fftSize, cudaMemcpyDeviceToHost, stream[j]));

        }[/codebox]

it takes almost 10ms, but when i replace stream[j] with 0, it only takes 8ms.

Topic		Replies	Views
cudaMemcpy2DAsync a lot slower than cudaMemcpy normally CUDA Programming and Performance	6	113	August 22, 2024
Overhead using cudaMemcpyAsync CUDA Programming and Performance	5	3198	September 1, 2009
Zero-copy from host to device decreases cudaMemcpyAsync device to host performance CUDA Programming and Performance	0	554	January 27, 2020
Much slower async memcpy in a separate stream than in stream 0 CUDA Programming and Performance	4	5194	July 23, 2015
async memcpy only seems to overlap device->host CUDA Programming and Performance	0	948	August 17, 2009
cudaMemcpyAsync makes code faster even when using the default stream 0 CUDA Programming and Performance	1	1403	January 10, 2022
:rolleyes: wath Gain using stream? code with stream take more time to execute, wath is the gain of s CUDA Programming and Performance	3	7181	February 12, 2010
About Stream control CUDA Programming and Performance	1	939	March 26, 2009
CUDA stream CUDA Programming and Performance	1	4649	April 11, 2010
Overhead of using non-default stream with cudaMemcpyAsync() too high? CUDA Programming and Performance	2	2134	August 5, 2009

cudaMemcpyAsync decrease the data transfer performance?

Related topics