Cudamemcpy and thrust::device_vector erase

I have a buffer of input samples. I have processed some of the samples already…I want to shift my input buffer and forget or erase those processed samples.

Below is the code I am trying to run. thrust::device_vector erase works every time always correct. cudaMemcpy is wrong sometimes.

I have even tried littering the code with cudaDeviceSynchronize().

      thrust::device_vector<thrust::complex<float> > d_input_buffer;
      thrust::complex<float> *input_buffer_ptr;

      int num_samples_processed = 10;

      // sometimes wrong
      thrust::complex<float>  *dest = (thrust::complex<float> *)thrust::raw_pointer_cast(;
      thrust::complex<float>  *src  = (thrust::complex<float> *)thrust::raw_pointer_cast(;
      int num_memcpy = d_input_buffer.size()-num_samples_processed;

      // always right

Hi @jravert

I would exploit more the thrust library by using its copy function. I would use something like:

thrust::copy(src.begin(), src.end(), dest.begin());
/* This is to allocate without resizing */
thrust::device_vector<thrust::complex<float> > d_input_buffer(1000);

thrust::complex<float> *input_buffer_ptr;

int num_samples_processed = 10;

auto start_offset = d_input_buffer.begin() + num_samples_processed;

/* This is instead all the raw pointer cast and the memcopy */
thrust::copy(start_offset , d_input_buffer.end(), d_input_buffer.begin());

Something that I didn’t get is the logic. You are shifting the elements to the left (seeing the left as the first indices) and deleting afterwards. If you want to get rid of the elements (erase them), you might have penalties because of reallocation. I would work only with shifting, even though, you can have a kind of offset index where to start. Thus, you will save a copy.


Thanks @luis.leon. thrust::copy still has errors.

I am wondering if it has to do with the overlapping regions. The CUDA Toolkit Documentation here says “The memory areas may not overlap.” for cudaMemcpyAsync but not cudaMemcpy.

Hmmm. Could you be more specific with errors? You mean that the execution crashes, data corruption? The only way that I see that it can be erratic is because of race conditions. You can plan with other techniques like data redundancy and swapping for making the data overlap safer.


The errors cause data corruption. I believe it is race conditions.

I ended up copying the overlapping region to a temporary buffer then back to the input_buffer. Yes there are more memcpys than I want…but there are no race conditions.