thrust::device_vector<T>::resize() without overwriting elements?

MutantJohn · January 12, 2015, 12:44am

So, I like the convenience of the device vectors. I like to use the size() function but I’m running into an issue.

I have a normal sized vector but it’s capacity is much larger than its size. I write to the vector and now the size is no longer accurate. The only problem is, using the resize() function keeps 0’ing the elements.

For example, if I have

thrust::device_vector<int> x;
x.reserve(100);
x.resize(10);

// use first 10 elements
// overwrite the vector with 20 elements

x.resize(20); // zeroes out the values that I don't want it to

Am I just using the container wrong? I really don’t want to have to manually write my own class to get the behavior that I want.

I’m also not sure that I can resize() the vector before I do all the writing. I’ll look into that really quickly.

Edit : The way my code works is,

// write to vector first
// sort vector
iter     = thrust::find(begin_iter, end_iter, -1);
new_size = thrust::distance(begin_iter, iter);

// resize vectors

where -1 is the initialized value (i.e. the default one).

Just looking at this, the dependency seems to imply that I can’t resize before I write because I need the written data to get a new size.

Robert_Crovella · January 12, 2015, 12:57am

I don’t have any trouble with it. It always helps if you provide a complete code.

Here’s my example:

$ cat t661.cu
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/fill.h>
#include <iostream>

int main(){

  thrust::device_vector<int> x;
  x.reserve(100);
  x.resize(10);

  thrust::fill(x.begin(), x.end(), 10);
  thrust::copy(x.begin(), x.end(), std::ostream_iterator<int>(std::cout, ","));
  std::cout << std::endl;
  x.resize(20);
  thrust::copy(x.begin(), x.end(), std::ostream_iterator<int>(std::cout, ","));
  std::cout << std::endl;
}

$ nvcc -arch=sm_20 -o t661 t661.cu
$ ./t661
10,10,10,10,10,10,10,10,10,10,
10,10,10,10,10,10,10,10,10,10,0,0,0,0,0,0,0,0,0,0,
$

MutantJohn · January 12, 2015, 1:02am

I think I explained this incorrectly. Sorry, I’m kind of tired.

In my actual code, I’m doing this:

Note : arr_cap > num_cart_points

pa.reserve(arr_cap);
    ta.reserve(arr_cap);
    fs.reserve(arr_cap);
    la.reserve(arr_cap);

    pa.assign(arr_cap, -1);
    ta.assign(arr_cap, -1);
    fs.assign(arr_cap, -1);
    la.assign(arr_cap, -1);

    pa.resize(num_cart_points);
    ta.resize(num_cart_points); 
    fs.resize(num_cart_points);
    la.resize(num_cart_points);

For example, num_cart_points is initially 64.

I then write to these arrays. This is safe because the capacity is there. I’m not writing out of bounds.

I use the first 64 elements of data to calculate a bunch of info which I write to the back of the array. For example, I’ll write a bunch of stuff after the 64th element of the array.

The initial 64 elements are then reset back to -1 and I sort the array so that all the -1’s are behind anything positive. So it goes { 0, 1, 2, 3, -1, …, -1 }.

I then calculate the new_size to be 87 as that’s the first occurrence of -1 so the new “size” should be 87.

Before, I resize() the vectors, the data is as it should be. The only issue is that the size of each vector is still 64 and not 87.

So when I resize() the vectors, the last 23 elements get reset to 0.

This is the data before the resize (old size = 64, this is visually confirmed) :

4, 2, 1, 1
7, 2, 2, 3
8, 2, 3, 7
9, 2, 2, 9
11, 2, 2, 5
12, 2, 2, 5
13, 2, 2, 5
16, 2, 3, 13
17, 2, 3, 13
18, 2, 3, 7
19, 2, 3, 7
20, 2, 3, 7
21, 2, 3, 7
22, 2, 4, 15
23, 2, 4, 15
24, 2, 4, 15
25, 2, 4, 15
26, 2, 3, 7
27, 2, 3, 7
28, 2, 3, 7
29, 2, 2, 3
30, 2, 2, 3
31, 2, 3, 7
32, 2, 3, 11
35, 2, 3, 11
46, 2, 4, 15
47, 2, 3, 11
48, 2, 2, 9
49, 2, 3, 13
4, 1, 1, 1
5, 1, 2, 3
6, 1, 3, 7
7, 1, 2, 5
9, 1, 2, 9
29, 1, 2, 5
30, 1, 2, 5
32, 1, 3, 13
33, 1, 3, 7
34, 1, 3, 7
35, 1, 3, 13
36, 1, 4, 15
37, 1, 3, 7
38, 1, 3, 7
39, 1, 4, 15
40, 1, 4, 15
41, 1, 3, 7
42, 1, 3, 7
43, 1, 4, 15
44, 1, 4, 15
47, 1, 3, 13
48, 1, 2, 9
51, 1, 3, 11
62, 1, 3, 11
63, 1, 3, 11
64, 1, 3, 7
65, 1, 3, 7
66, 1, 2, 3
67, 1, 2, 3
4, 0, 1, 1
5, 0, 2, 5
9, 0, 2, 9
10, 0, 3, 7
11, 0, 2, 3
12, 0, 2, 3
13, 0, 2, 3
14, 0, 3, 7
15, 0, 3, 7
16, 0, 3, 11
17, 0, 3, 11
48, 0, 2, 9
49, 0, 3, 11
50, 0, 4, 15
51, 0, 3, 13
52, 0, 4, 15
53, 0, 4, 15
54, 0, 4, 15
55, 0, 4, 15
56, 0, 3, 7
57, 0, 3, 7
58, 0, 3, 7
59, 0, 3, 7
60, 0, 3, 7
61, 0, 3, 7
62, 0, 3, 13
63, 0, 3, 13
66, 0, 2, 5
67, 0, 2, 5

And this is the data after the resize (note how the last 23 elements are now 0) :

0 : 4, 2, 1, 1
1 : 7, 2, 2, 3
2 : 8, 2, 3, 7
3 : 9, 2, 2, 9
4 : 11, 2, 2, 5
5 : 12, 2, 2, 5
6 : 13, 2, 2, 5
7 : 16, 2, 3, 13
8 : 17, 2, 3, 13
9 : 18, 2, 3, 7
10 : 19, 2, 3, 7
11 : 20, 2, 3, 7
12 : 21, 2, 3, 7
13 : 22, 2, 4, 15
14 : 23, 2, 4, 15
15 : 24, 2, 4, 15
16 : 25, 2, 4, 15
17 : 26, 2, 3, 7
18 : 27, 2, 3, 7
19 : 28, 2, 3, 7
20 : 29, 2, 2, 3
21 : 30, 2, 2, 3
22 : 31, 2, 3, 7
23 : 32, 2, 3, 11
24 : 35, 2, 3, 11
25 : 46, 2, 4, 15
26 : 47, 2, 3, 11
27 : 48, 2, 2, 9
28 : 49, 2, 3, 13
29 : 4, 1, 1, 1
30 : 5, 1, 2, 3
31 : 6, 1, 3, 7
32 : 7, 1, 2, 5
33 : 9, 1, 2, 9
34 : 29, 1, 2, 5
35 : 30, 1, 2, 5
36 : 32, 1, 3, 13
37 : 33, 1, 3, 7
38 : 34, 1, 3, 7
39 : 35, 1, 3, 13
40 : 36, 1, 4, 15
41 : 37, 1, 3, 7
42 : 38, 1, 3, 7
43 : 39, 1, 4, 15
44 : 40, 1, 4, 15
45 : 41, 1, 3, 7
46 : 42, 1, 3, 7
47 : 43, 1, 4, 15
48 : 44, 1, 4, 15
49 : 47, 1, 3, 13
50 : 48, 1, 2, 9
51 : 51, 1, 3, 11
52 : 62, 1, 3, 11
53 : 63, 1, 3, 11
54 : 64, 1, 3, 7
55 : 65, 1, 3, 7
56 : 66, 1, 2, 3
57 : 67, 1, 2, 3
58 : 4, 0, 1, 1
59 : 5, 0, 2, 5
60 : 9, 0, 2, 9
61 : 10, 0, 3, 7
62 : 11, 0, 2, 3
63 : 12, 0, 2, 3
64 : 0, 0, 0, 0
65 : 0, 0, 0, 0
66 : 0, 0, 0, 0
67 : 0, 0, 0, 0
68 : 0, 0, 0, 0
69 : 0, 0, 0, 0
70 : 0, 0, 0, 0
71 : 0, 0, 0, 0
72 : 0, 0, 0, 0
73 : 0, 0, 0, 0
74 : 0, 0, 0, 0
75 : 0, 0, 0, 0
76 : 0, 0, 0, 0
77 : 0, 0, 0, 0
78 : 0, 0, 0, 0
79 : 0, 0, 0, 0
80 : 0, 0, 0, 0
81 : 0, 0, 0, 0
82 : 0, 0, 0, 0
83 : 0, 0, 0, 0
84 : 0, 0, 0, 0
85 : 0, 0, 0, 0
86 : 0, 0, 0, 0

Sorry if I was vague earlier.

MutantJohn · January 12, 2015, 1:10am

Using your example, this would be modified to be :

#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/fill.h>
#include <iostream>

int main(){

  thrust::device_vector<int> x;
  x.reserve(100);
  x.resize(10);

  thrust::fill(x.begin(), x.begin() + 20, 10); // this is safe to do because the capacity is large enough
  thrust::copy(x.begin(), x.begin() + 20, std::ostream_iterator<int>(std::cout, ","));
  std::cout << std::endl;
  x.resize(20);
  thrust::copy(x.begin(), x.begin() + 20, std::ostream_iterator<int>(std::cout, ","));
  std::cout << std::endl;
}

Hopefully this provides a more clear example. If I could get that x.resize(20) call to NOT set elements back to 0, that’s what I’m after.

Output : 
10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10, // correct
10,10,10,10,10,10,10,10,10,10,0,0,0,0,0,0,0,0,0,0,           // last 10 set back to 0 which I don't want

Robert_Crovella · January 12, 2015, 1:14am

It’s still unclear, nor have you provided a complete code.

It appears that you are sizing a vector to 64 elements, then writing 87 elements.

You can’t write to a vector beyond it’s size. That is UB. It is UB for std::vector as well.

.reserve() is merely a hint to the underlying allocator. It does not make the vector that size.

http://www.cplusplus.com/reference/vector/vector/reserve/

“This function has no effect on the vector size and cannot alter its elements.”

MutantJohn · January 12, 2015, 1:17am

Hopefully the modified version of the code you posted shows what I’m trying to accomplish.

Sorry for the code-dump, but here’s my complete function :

void get_fract_locations(thrust::device_vector<int> &pa,
                         thrust::device_vector<int> &ta,
                         thrust::device_vector<int> &fs,
                         thrust::device_vector<int> &la,
                         thrust::device_vector<int> &nominated,
                         int                        *num_tetra,
                         tetrahedron                *mesh,
                         point                      *points,
                         float                      *predConsts)
{
    // sometimes unified memory is a tad annoying...
    // (I'm referring to the need for explicity sync)
    cudaDeviceSynchronize();
    const int nt = *num_tetra;

    // we want to calculate addresses of each fracure in
    // mesh buffer

    // sort all data by tetrahedra first...
    thrust::sort_by_key(ta.begin(),
                        ta.end(),
                        thrust::make_zip_iterator(
                            thrust::make_tuple(pa.begin(),
                                               fs.begin(),
                                               la.begin(),
                                               nominated.begin())));

// fundamentally, we want all the fracture locations written
    // out into a 1d array.
    // we get writing addresses from the prefix sum of fs * nominated
    // we also get the total number of fracture addresses/locations
    // from the very last element of the sum

    // allocate output array for writing addresses
    thrust::device_vector<int> alpha_sum(ta.size(), -1);

    // we also need to get the offsets to write to in the actual
    // mesh array itself
    // this is the same as (fs[i] - 1) * nominated[i], if nominated[i]
    // is 1

    // allocate output array for mesh offsets
    thrust::device_vector<int> beta_sum(ta.size(), -1);

    // perform modified inclusive scan (alpha sum)
    thrust::exclusive_scan(
        // beginning iterator
        thrust::make_transform_iterator(
            thrust::make_zip_iterator(
                thrust::make_tuple(fs.begin(), nominated.begin())),
            fract_bucket_op<int>()),
        // ending iterator
        thrust::make_transform_iterator(
            thrust::make_zip_iterator(
                thrust::make_tuple(fs.end(), nominated.end())),
            fract_bucket_op<int>()),
        // iterator to write to
        alpha_sum.begin());

    // but the mesh offsets for writing are (fs[i] - 1) * nominated[i]
    // need to perform a modified prefix sum
    // well, assuming fs[i] > 0 else no need to subtract 1
    thrust::exclusive_scan(
        // beginning iterator
        thrust::make_transform_iterator(
            thrust::make_zip_iterator(
                thrust::make_tuple(fs.begin(), nominated.begin())),
            mesh_offset_op<int>()),
        // ending iterator
        thrust::make_transform_iterator(
            thrust::make_zip_iterator(
                thrust::make_tuple(fs.end(), nominated.end())),
            mesh_offset_op<int>()),
        // iterator to write to
        beta_sum.begin());

    cudaDeviceSynchronize();

    // get total number of fracture addresses
    const int num_addresses = alpha_sum.back() + 
                              nominated.back() * fs.back();

    // allocate space for bucket contents
    // fl = fracture locations
    thrust::device_vector<int> fl(num_addresses, -1);

    // allocate array to keep track of original tetrahedron
    // to be associated with each fracture set
    thrust::device_vector<int> parent(fl.size(), -1);

    // launch kernel that writes addresses
    write_fracture_locations<<<bpg, tpb>>>
                            (nominated.size(),
                             thrust::raw_pointer_cast(nominated.data()),
                             thrust::raw_pointer_cast(ta.data()),
                             thrust::raw_pointer_cast(alpha_sum.data()),
                             thrust::raw_pointer_cast(fs.data()),
                             nt,
                             thrust::raw_pointer_cast(beta_sum.data()),
                             thrust::raw_pointer_cast(fl.data()),
                             thrust::raw_pointer_cast(parent.data()));

    cudaDeviceSynchronize();

    //for (int i = 0; i < fl.size(); ++i)
        //std::cout << fl[i] << " : " << parent[i] << std::endl;

    // we now want to hash fracture locations by the id of the 
    // original tetrahedron
    const int fract_num_buckets = parent.back() + 1;
    const int fract_num_keys    = fl.size();

    const int *fract_which_bucket  = thrust::raw_pointer_cast(parent.data());

    // want to also hash points by tetrahedron id
    const int tetra_num_buckets = ta.back() + 1;
    const int tetra_num_keys    = pa.size();
    
    const int *tetra_which_bucket = thrust::raw_pointer_cast(ta.data());

    thrust::device_vector<int> fract_bucket_starts(fract_num_buckets, -1);
    thrust::device_vector<int> tetra_bucket_starts(tetra_num_buckets, -1);

    int *f_bucket_starts = thrust::raw_pointer_cast(fract_bucket_starts.data());
    int *t_bucket_starts = thrust::raw_pointer_cast(tetra_bucket_starts.data());

    find_boundaries<<<bpg, tpb>>>
                   (fract_num_keys,
                    fract_num_buckets,
                    fract_which_bucket,
                    f_bucket_starts);

    find_boundaries<<<bpg, tpb>>>
                   (tetra_num_keys,
                    tetra_num_buckets,
                    tetra_which_bucket,
                    t_bucket_starts);

    cudaDeviceSynchronize();
/*    
    for (int i = 0; i < fract_bucket_starts.size(); ++i)
        std::cout << fract_bucket_starts[i] << std::endl;

    for (int i = 0; i < tetra_bucket_starts.size(); ++i)
        std::cout << tetra_bucket_starts[i] << std::endl;
*/
    const int arr_cap = ta.capacity();

    thrust::device_vector<int>::iterator iter;
    int new_size = -1;
    
    fracture_tetrahedra<<<bpg, tpb>>>
                       (nominated.size(),
                        thrust::raw_pointer_cast(nominated.data()),
                        thrust::raw_pointer_cast(ta.data()),
                        thrust::raw_pointer_cast(pa.data()),
                        thrust::raw_pointer_cast(la.data()),
                        f_bucket_starts,
                        thrust::raw_pointer_cast(fl.data()),
                        mesh);

    redistribute_points<<<bpg, tpb>>>
                       (pa.size(),
                        fract_num_buckets,
                        t_bucket_starts,
                        f_bucket_starts, 
                        thrust::raw_pointer_cast(fl.data()),
                        thrust::raw_pointer_cast(nominated.data()),
                        mesh,
                        points,
                        predConsts,
                        thrust::raw_pointer_cast(pa.data()),
                        thrust::raw_pointer_cast(ta.data()),
                        thrust::raw_pointer_cast(fs.data()),
                        thrust::raw_pointer_cast(la.data()));

    redistribution_cleanup<<<bpg, tpb>>>
                          (fract_num_buckets,
                           f_bucket_starts,
                           t_bucket_starts,
                           thrust::raw_pointer_cast(pa.data()),
                           thrust::raw_pointer_cast(ta.data()),
                           thrust::raw_pointer_cast(fs.data()),
                           thrust::raw_pointer_cast(la.data()));

    thrust::sort(thrust::make_zip_iterator(
                    thrust::make_tuple(ta.begin(),
                                       pa.begin(),
                                       fs.begin(),
                                       la.begin())),
                 thrust::make_zip_iterator(
                    thrust::make_tuple(ta.begin() + arr_cap,
                                       pa.begin() + arr_cap,
                                       fs.begin() + arr_cap,
                                       la.begin() + arr_cap)),
                 tuple_comp<int>());

    iter     = thrust::find(ta.begin(), ta.begin() + arr_cap, -1);    
    new_size = thrust::distance(ta.begin(), iter); 

    cudaDeviceSynchronize();    
    std::cout << "old size is : " << ta.size() << std::endl;
    std::cout << "new size is : " << new_size << std::endl;

    ta.resize(new_size);
    pa.resize(new_size);
    fs.resize(new_size);
    la.resize(new_size);
    nominated.resize(new_size);

    for (int i = 0; i < pa.size(); ++i)
    {
        std::cout << i << " : " << pa[i] << ", " << ta[i] << ", " << fs[i] << ", " << la[i] << std::endl;
    }
}

Robert_Crovella · January 12, 2015, 2:30am

This is not legal, in spite of your comment:

x.resize(10);

  thrust::fill(x.begin(), x.begin() + 20, 10);

You cannot write to a vector beyond its size.

This is also illegal:

x.resize(10);
  ...
  thrust::copy(x.begin(), x.begin() + 20, std::ostream_iterator<int>(std::cout, ","));

You’re not allowed to access a vector beyond it’s size.

Furthermore, newly created vector container elements are required to be zero’d, whether at initial allocation or by resize() to a larger size.

Anyway, why should we argue about it?

Why not do this:

x.reserve(100);
  x.resize(100);

?

You can later resize it to 87, if you want to. In that case, elements between 64 and 87 will not be zero’ed.

As another way of how your process is broken, consider this:

I reserve 100 spaces for my vector.
I resize to 10.
Then I write 20 elements. (this is illegal)
Then I do a .push_back()

By definition, element 10 that I wrote previously out-of-bounds is now overwritten by the push_back() operation.

Vectors are not intended to be written beyond their size. None of the methods (.end(), .size(), etc.) expect you to do this, and many will not have sensible behavior if you do. It’s that simple, and you should revise your thinking and your algorithm.

MutantJohn · January 12, 2015, 5:15pm

You’re right. Thank you for your wisdom. I was using vectors for their convenience instead of creating my own structure of arrays.

I realize now that my code would be a lot cleaner if I encapsulated everything properly. Initially, I was just being lazy about my own class design but it’s become clear that it’s time to create a new class and use that for my needs.

Thank you for the advice!

Edit : You brought up a lot of really good points about why thrust::device_vector was not the container I should be using.

Topic		Replies	Views
How to efficiently sort 5 arrays of integers? CUDA Programming and Performance	7	1162	June 19, 2015
Thrust functor return array; code optimization CUDA Programming and Performance	3	1104	December 27, 2019
Sorting in CUDA Is sorting in CUDA worth the trouble? CUDA Programming and Performance	15	8069	September 30, 2009
Reading from float4 corrupts memory CUDA Programming and Performance cuda	18	40	October 3, 2024
Vector push_back in Thrust device code and strange behavior of nvc++/stdpar nvc, nvc++ and nvfortran cuda	0	1058	January 18, 2022
Using Thrust to sort Unified Memory Buffer? GPU-Accelerated Libraries	8	5060	May 7, 2015
Does thrust::device_vector::resize() cause reallocation when resizing to a smaller size? CUDA Programming and Performance	1	565	December 2, 2022
Understanding and adjusting Mark Harris's array reduction CUDA Programming and Performance	11	4106	August 26, 2018
Async thrust operation launches appear serially processed in nsight systems CUDA Programming and Performance cuda , performance , parallel-computing	5	89	August 29, 2024
Unified Memory Behavior... CUDA Programming and Performance	4	3401	August 11, 2014

thrust::device_vector<T>::resize() without overwriting elements?

Related topics