Device to device copying with Thrust

trf86 · August 24, 2015, 2:19pm

I am looking to copy some data from a large vector (order 10^6 values) to a smaller one (order 10^2). I was considering a couple ways to do it. I could either copy the values directly from the host using Thrust (my values are stored in device vectors anyway) or I could create a vector of all elements to move, then invoke a single kernel where each thread copies a value over.

I was wondering about the performance aspects of each approach. Will I incur appreciable overhead using Thrust to copy the values? No matter what this operation will be memory bandwidth limited, but what is the fastest approach to copy some values (not all of which are adjacent) from a large vector to a smaller one?

Thanks.

trf86 · August 27, 2015, 1:47pm

Has anyone benchmarked copying vectors one element at a time with thrust versus a single kernel call?

EDIT: I wrote an algorithm to test it myself. For one million elements, copying one at a time takes 3.5 sec on my machine versus 0.7 millisec using thrust::copy. Guess I will need to create a vector and invoke a single kernel for the copy operation.

Topic		Replies	Views
Slow device to device memory copy CUDA Programming and Performance	7	738	March 17, 2019
thrust copy_if function is slow on gpu data CUDA Programming and Performance	2	1950	June 10, 2019
Thrust::copy_if on multiple range CUDA Programming and Performance cuda	3	1590	July 31, 2021
Efficient repeated copying of a vector CUDA Programming and Performance	10	3278	August 24, 2023
Using thrust::copy() to copy from a file to a device_vector CUDA Programming and Performance	3	4060	March 22, 2012
Copying a single value from device CUDA Programming and Performance	2	2192	July 8, 2009
Using Thrust to operate with vectors CUDA Programming and Performance	5	451	April 26, 2024
a strange question in "thrust::copy()" CUDA Programming and Performance	2	593	September 5, 2017
copy result to host question CUDA Programming and Performance	5	948	March 23, 2017
Thrust: Concurrency and Kernels CUDA Programming and Performance	3	929	June 12, 2023

Device to device copying with Thrust

Related topics