cub::DeviceRadixSort::sortKeys concurrent to kernel?

inbaltomer · June 7, 2021, 5:19am

I’m performing a sort of a device vector using sortKeys.
I’m preparing the input vector (the vector to be sorted), by calling some other kernel, before the sort operation. I found out that I have to add a cudaMemcpy (or cudaDeviceSyncronize()), between the kernel and the sort, to get correct results.
Why? does cub::sortKeys run on a different stream?

Robert_Crovella · June 7, 2021, 1:52pm

not in my experience

You can confirm the streams used by the various activities with a profiler.

Topic		Replies	Views
CUDA stream management CUDA Programming and Performance	1	456	December 15, 2016
Concurrent memcpy and kernel execution CUDA Programming and Performance	5	1426	December 9, 2014
Weird behaviour of CUDA streams CUDA Programming and Performance	0	1892	June 17, 2010
cuda-memcheck CUDA Programming and Performance	1	800	November 16, 2015
Multiple kernels concurrency problems + MemcpyToArrayAsync() incorrect stream CUDA Programming and Performance	0	571	June 18, 2013
Unable to run kernel on device 1 with memory in device 2 CUDA Programming and Performance	10	923	January 24, 2017
a question about the asynchronous mechanism and stream CUDA Programming and Performance	3	1891	December 10, 2008
Using streams... Howto? CUDA Programming and Performance	0	1113	July 25, 2008
Asynchronicity of kernel execution and cuMemcpy CUDA Programming and Performance	2	3281	March 23, 2009
Syncronization with cuda Streams CUDA Programming and Performance cuda	8	426	October 12, 2021

cub::DeviceRadixSort::sortKeys concurrent to kernel?

Related topics