thrust::device_vector<T>::resize() without overwriting elements?

So, I like the convenience of the device vectors. I like to use the size() function but I’m running into an issue.

I have a normal sized vector but it’s capacity is much larger than its size. I write to the vector and now the size is no longer accurate. The only problem is, using the resize() function keeps 0’ing the elements.

For example, if I have

thrust::device_vector<int> x;
x.reserve(100);
x.resize(10);

// use first 10 elements
// overwrite the vector with 20 elements

x.resize(20); // zeroes out the values that I don't want it to

Am I just using the container wrong? I really don’t want to have to manually write my own class to get the behavior that I want.

I’m also not sure that I can resize() the vector before I do all the writing. I’ll look into that really quickly.

Edit : The way my code works is,

// write to vector first
// sort vector
iter     = thrust::find(begin_iter, end_iter, -1);
new_size = thrust::distance(begin_iter, iter);

// resize vectors

where -1 is the initialized value (i.e. the default one).

Just looking at this, the dependency seems to imply that I can’t resize before I write because I need the written data to get a new size.

I don’t have any trouble with it. It always helps if you provide a complete code.

Here’s my example:

$ cat t661.cu
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/fill.h>
#include <iostream>

int main(){

  thrust::device_vector<int> x;
  x.reserve(100);
  x.resize(10);

  thrust::fill(x.begin(), x.end(), 10);
  thrust::copy(x.begin(), x.end(), std::ostream_iterator<int>(std::cout, ","));
  std::cout << std::endl;
  x.resize(20);
  thrust::copy(x.begin(), x.end(), std::ostream_iterator<int>(std::cout, ","));
  std::cout << std::endl;
}

$ nvcc -arch=sm_20 -o t661 t661.cu
$ ./t661
10,10,10,10,10,10,10,10,10,10,
10,10,10,10,10,10,10,10,10,10,0,0,0,0,0,0,0,0,0,0,
$

I think I explained this incorrectly. Sorry, I’m kind of tired.

In my actual code, I’m doing this:

Note : arr_cap > num_cart_points

pa.reserve(arr_cap);
    ta.reserve(arr_cap);
    fs.reserve(arr_cap);
    la.reserve(arr_cap);

    pa.assign(arr_cap, -1);
    ta.assign(arr_cap, -1);
    fs.assign(arr_cap, -1);
    la.assign(arr_cap, -1);

    pa.resize(num_cart_points);
    ta.resize(num_cart_points); 
    fs.resize(num_cart_points);
    la.resize(num_cart_points);

For example, num_cart_points is initially 64.

I then write to these arrays. This is safe because the capacity is there. I’m not writing out of bounds.

I use the first 64 elements of data to calculate a bunch of info which I write to the back of the array. For example, I’ll write a bunch of stuff after the 64th element of the array.

The initial 64 elements are then reset back to -1 and I sort the array so that all the -1’s are behind anything positive. So it goes { 0, 1, 2, 3, -1, …, -1 }.

I then calculate the new_size to be 87 as that’s the first occurrence of -1 so the new “size” should be 87.

Before, I resize() the vectors, the data is as it should be. The only issue is that the size of each vector is still 64 and not 87.

So when I resize() the vectors, the last 23 elements get reset to 0.

This is the data before the resize (old size = 64, this is visually confirmed) :

4, 2, 1, 1
7, 2, 2, 3
8, 2, 3, 7
9, 2, 2, 9
11, 2, 2, 5
12, 2, 2, 5
13, 2, 2, 5
16, 2, 3, 13
17, 2, 3, 13
18, 2, 3, 7
19, 2, 3, 7
20, 2, 3, 7
21, 2, 3, 7
22, 2, 4, 15
23, 2, 4, 15
24, 2, 4, 15
25, 2, 4, 15
26, 2, 3, 7
27, 2, 3, 7
28, 2, 3, 7
29, 2, 2, 3
30, 2, 2, 3
31, 2, 3, 7
32, 2, 3, 11
35, 2, 3, 11
46, 2, 4, 15
47, 2, 3, 11
48, 2, 2, 9
49, 2, 3, 13
4, 1, 1, 1
5, 1, 2, 3
6, 1, 3, 7
7, 1, 2, 5
9, 1, 2, 9
29, 1, 2, 5
30, 1, 2, 5
32, 1, 3, 13
33, 1, 3, 7
34, 1, 3, 7
35, 1, 3, 13
36, 1, 4, 15
37, 1, 3, 7
38, 1, 3, 7
39, 1, 4, 15
40, 1, 4, 15
41, 1, 3, 7
42, 1, 3, 7
43, 1, 4, 15
44, 1, 4, 15
47, 1, 3, 13
48, 1, 2, 9
51, 1, 3, 11
62, 1, 3, 11
63, 1, 3, 11
64, 1, 3, 7
65, 1, 3, 7
66, 1, 2, 3
67, 1, 2, 3
4, 0, 1, 1
5, 0, 2, 5
9, 0, 2, 9
10, 0, 3, 7
11, 0, 2, 3
12, 0, 2, 3
13, 0, 2, 3
14, 0, 3, 7
15, 0, 3, 7
16, 0, 3, 11
17, 0, 3, 11
48, 0, 2, 9
49, 0, 3, 11
50, 0, 4, 15
51, 0, 3, 13
52, 0, 4, 15
53, 0, 4, 15
54, 0, 4, 15
55, 0, 4, 15
56, 0, 3, 7
57, 0, 3, 7
58, 0, 3, 7
59, 0, 3, 7
60, 0, 3, 7
61, 0, 3, 7
62, 0, 3, 13
63, 0, 3, 13
66, 0, 2, 5
67, 0, 2, 5

And this is the data after the resize (note how the last 23 elements are now 0) :

0 : 4, 2, 1, 1
1 : 7, 2, 2, 3
2 : 8, 2, 3, 7
3 : 9, 2, 2, 9
4 : 11, 2, 2, 5
5 : 12, 2, 2, 5
6 : 13, 2, 2, 5
7 : 16, 2, 3, 13
8 : 17, 2, 3, 13
9 : 18, 2, 3, 7
10 : 19, 2, 3, 7
11 : 20, 2, 3, 7
12 : 21, 2, 3, 7
13 : 22, 2, 4, 15
14 : 23, 2, 4, 15
15 : 24, 2, 4, 15
16 : 25, 2, 4, 15
17 : 26, 2, 3, 7
18 : 27, 2, 3, 7
19 : 28, 2, 3, 7
20 : 29, 2, 2, 3
21 : 30, 2, 2, 3
22 : 31, 2, 3, 7
23 : 32, 2, 3, 11
24 : 35, 2, 3, 11
25 : 46, 2, 4, 15
26 : 47, 2, 3, 11
27 : 48, 2, 2, 9
28 : 49, 2, 3, 13
29 : 4, 1, 1, 1
30 : 5, 1, 2, 3
31 : 6, 1, 3, 7
32 : 7, 1, 2, 5
33 : 9, 1, 2, 9
34 : 29, 1, 2, 5
35 : 30, 1, 2, 5
36 : 32, 1, 3, 13
37 : 33, 1, 3, 7
38 : 34, 1, 3, 7
39 : 35, 1, 3, 13
40 : 36, 1, 4, 15
41 : 37, 1, 3, 7
42 : 38, 1, 3, 7
43 : 39, 1, 4, 15
44 : 40, 1, 4, 15
45 : 41, 1, 3, 7
46 : 42, 1, 3, 7
47 : 43, 1, 4, 15
48 : 44, 1, 4, 15
49 : 47, 1, 3, 13
50 : 48, 1, 2, 9
51 : 51, 1, 3, 11
52 : 62, 1, 3, 11
53 : 63, 1, 3, 11
54 : 64, 1, 3, 7
55 : 65, 1, 3, 7
56 : 66, 1, 2, 3
57 : 67, 1, 2, 3
58 : 4, 0, 1, 1
59 : 5, 0, 2, 5
60 : 9, 0, 2, 9
61 : 10, 0, 3, 7
62 : 11, 0, 2, 3
63 : 12, 0, 2, 3
64 : 0, 0, 0, 0
65 : 0, 0, 0, 0
66 : 0, 0, 0, 0
67 : 0, 0, 0, 0
68 : 0, 0, 0, 0
69 : 0, 0, 0, 0
70 : 0, 0, 0, 0
71 : 0, 0, 0, 0
72 : 0, 0, 0, 0
73 : 0, 0, 0, 0
74 : 0, 0, 0, 0
75 : 0, 0, 0, 0
76 : 0, 0, 0, 0
77 : 0, 0, 0, 0
78 : 0, 0, 0, 0
79 : 0, 0, 0, 0
80 : 0, 0, 0, 0
81 : 0, 0, 0, 0
82 : 0, 0, 0, 0
83 : 0, 0, 0, 0
84 : 0, 0, 0, 0
85 : 0, 0, 0, 0
86 : 0, 0, 0, 0

Sorry if I was vague earlier.

Using your example, this would be modified to be :

#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/fill.h>
#include <iostream>

int main(){

  thrust::device_vector<int> x;
  x.reserve(100);
  x.resize(10);

  thrust::fill(x.begin(), x.begin() + 20, 10); // this is safe to do because the capacity is large enough
  thrust::copy(x.begin(), x.begin() + 20, std::ostream_iterator<int>(std::cout, ","));
  std::cout << std::endl;
  x.resize(20);
  thrust::copy(x.begin(), x.begin() + 20, std::ostream_iterator<int>(std::cout, ","));
  std::cout << std::endl;
}

Hopefully this provides a more clear example. If I could get that x.resize(20) call to NOT set elements back to 0, that’s what I’m after.

Output : 
10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10, // correct
10,10,10,10,10,10,10,10,10,10,0,0,0,0,0,0,0,0,0,0,           // last 10 set back to 0 which I don't want

It’s still unclear, nor have you provided a complete code.

It appears that you are sizing a vector to 64 elements, then writing 87 elements.

You can’t write to a vector beyond it’s size. That is UB. It is UB for std::vector as well.

.reserve() is merely a hint to the underlying allocator. It does not make the vector that size.

http://www.cplusplus.com/reference/vector/vector/reserve/

“This function has no effect on the vector size and cannot alter its elements.”

Hopefully the modified version of the code you posted shows what I’m trying to accomplish.

Sorry for the code-dump, but here’s my complete function :

void get_fract_locations(thrust::device_vector<int> &pa,
                         thrust::device_vector<int> &ta,
                         thrust::device_vector<int> &fs,
                         thrust::device_vector<int> &la,
                         thrust::device_vector<int> &nominated,
                         int                        *num_tetra,
                         tetrahedron                *mesh,
                         point                      *points,
                         float                      *predConsts)
{
    // sometimes unified memory is a tad annoying...
    // (I'm referring to the need for explicity sync)
    cudaDeviceSynchronize();
    const int nt = *num_tetra;

    // we want to calculate addresses of each fracure in
    // mesh buffer

    // sort all data by tetrahedra first...
    thrust::sort_by_key(ta.begin(),
                        ta.end(),
                        thrust::make_zip_iterator(
                            thrust::make_tuple(pa.begin(),
                                               fs.begin(),
                                               la.begin(),
                                               nominated.begin())));

// fundamentally, we want all the fracture locations written
    // out into a 1d array.
    // we get writing addresses from the prefix sum of fs * nominated
    // we also get the total number of fracture addresses/locations
    // from the very last element of the sum

    // allocate output array for writing addresses
    thrust::device_vector<int> alpha_sum(ta.size(), -1);

    // we also need to get the offsets to write to in the actual
    // mesh array itself
    // this is the same as (fs[i] - 1) * nominated[i], if nominated[i]
    // is 1

    // allocate output array for mesh offsets
    thrust::device_vector<int> beta_sum(ta.size(), -1);

    // perform modified inclusive scan (alpha sum)
    thrust::exclusive_scan(
        // beginning iterator
        thrust::make_transform_iterator(
            thrust::make_zip_iterator(
                thrust::make_tuple(fs.begin(), nominated.begin())),
            fract_bucket_op<int>()),
        // ending iterator
        thrust::make_transform_iterator(
            thrust::make_zip_iterator(
                thrust::make_tuple(fs.end(), nominated.end())),
            fract_bucket_op<int>()),
        // iterator to write to
        alpha_sum.begin());

    // but the mesh offsets for writing are (fs[i] - 1) * nominated[i]
    // need to perform a modified prefix sum
    // well, assuming fs[i] > 0 else no need to subtract 1
    thrust::exclusive_scan(
        // beginning iterator
        thrust::make_transform_iterator(
            thrust::make_zip_iterator(
                thrust::make_tuple(fs.begin(), nominated.begin())),
            mesh_offset_op<int>()),
        // ending iterator
        thrust::make_transform_iterator(
            thrust::make_zip_iterator(
                thrust::make_tuple(fs.end(), nominated.end())),
            mesh_offset_op<int>()),
        // iterator to write to
        beta_sum.begin());

    cudaDeviceSynchronize();

    // get total number of fracture addresses
    const int num_addresses = alpha_sum.back() + 
                              nominated.back() * fs.back();

    // allocate space for bucket contents
    // fl = fracture locations
    thrust::device_vector<int> fl(num_addresses, -1);

    // allocate array to keep track of original tetrahedron
    // to be associated with each fracture set
    thrust::device_vector<int> parent(fl.size(), -1);

    // launch kernel that writes addresses
    write_fracture_locations<<<bpg, tpb>>>
                            (nominated.size(),
                             thrust::raw_pointer_cast(nominated.data()),
                             thrust::raw_pointer_cast(ta.data()),
                             thrust::raw_pointer_cast(alpha_sum.data()),
                             thrust::raw_pointer_cast(fs.data()),
                             nt,
                             thrust::raw_pointer_cast(beta_sum.data()),
                             thrust::raw_pointer_cast(fl.data()),
                             thrust::raw_pointer_cast(parent.data()));

    cudaDeviceSynchronize();

    //for (int i = 0; i < fl.size(); ++i)
        //std::cout << fl[i] << " : " << parent[i] << std::endl;

    // we now want to hash fracture locations by the id of the 
    // original tetrahedron
    const int fract_num_buckets = parent.back() + 1;
    const int fract_num_keys    = fl.size();

    const int *fract_which_bucket  = thrust::raw_pointer_cast(parent.data());

    // want to also hash points by tetrahedron id
    const int tetra_num_buckets = ta.back() + 1;
    const int tetra_num_keys    = pa.size();
    
    const int *tetra_which_bucket = thrust::raw_pointer_cast(ta.data());

    thrust::device_vector<int> fract_bucket_starts(fract_num_buckets, -1);
    thrust::device_vector<int> tetra_bucket_starts(tetra_num_buckets, -1);

    int *f_bucket_starts = thrust::raw_pointer_cast(fract_bucket_starts.data());
    int *t_bucket_starts = thrust::raw_pointer_cast(tetra_bucket_starts.data());

    find_boundaries<<<bpg, tpb>>>
                   (fract_num_keys,
                    fract_num_buckets,
                    fract_which_bucket,
                    f_bucket_starts);

    find_boundaries<<<bpg, tpb>>>
                   (tetra_num_keys,
                    tetra_num_buckets,
                    tetra_which_bucket,
                    t_bucket_starts);

    cudaDeviceSynchronize();
/*    
    for (int i = 0; i < fract_bucket_starts.size(); ++i)
        std::cout << fract_bucket_starts[i] << std::endl;

    for (int i = 0; i < tetra_bucket_starts.size(); ++i)
        std::cout << tetra_bucket_starts[i] << std::endl;
*/
    const int arr_cap = ta.capacity();

    thrust::device_vector<int>::iterator iter;
    int new_size = -1;
    
    fracture_tetrahedra<<<bpg, tpb>>>
                       (nominated.size(),
                        thrust::raw_pointer_cast(nominated.data()),
                        thrust::raw_pointer_cast(ta.data()),
                        thrust::raw_pointer_cast(pa.data()),
                        thrust::raw_pointer_cast(la.data()),
                        f_bucket_starts,
                        thrust::raw_pointer_cast(fl.data()),
                        mesh);

    redistribute_points<<<bpg, tpb>>>
                       (pa.size(),
                        fract_num_buckets,
                        t_bucket_starts,
                        f_bucket_starts, 
                        thrust::raw_pointer_cast(fl.data()),
                        thrust::raw_pointer_cast(nominated.data()),
                        mesh,
                        points,
                        predConsts,
                        thrust::raw_pointer_cast(pa.data()),
                        thrust::raw_pointer_cast(ta.data()),
                        thrust::raw_pointer_cast(fs.data()),
                        thrust::raw_pointer_cast(la.data()));

    redistribution_cleanup<<<bpg, tpb>>>
                          (fract_num_buckets,
                           f_bucket_starts,
                           t_bucket_starts,
                           thrust::raw_pointer_cast(pa.data()),
                           thrust::raw_pointer_cast(ta.data()),
                           thrust::raw_pointer_cast(fs.data()),
                           thrust::raw_pointer_cast(la.data()));

    thrust::sort(thrust::make_zip_iterator(
                    thrust::make_tuple(ta.begin(),
                                       pa.begin(),
                                       fs.begin(),
                                       la.begin())),
                 thrust::make_zip_iterator(
                    thrust::make_tuple(ta.begin() + arr_cap,
                                       pa.begin() + arr_cap,
                                       fs.begin() + arr_cap,
                                       la.begin() + arr_cap)),
                 tuple_comp<int>());

    iter     = thrust::find(ta.begin(), ta.begin() + arr_cap, -1);    
    new_size = thrust::distance(ta.begin(), iter); 

    cudaDeviceSynchronize();    
    std::cout << "old size is : " << ta.size() << std::endl;
    std::cout << "new size is : " << new_size << std::endl;

    ta.resize(new_size);
    pa.resize(new_size);
    fs.resize(new_size);
    la.resize(new_size);
    nominated.resize(new_size);

    for (int i = 0; i < pa.size(); ++i)
    {
        std::cout << i << " : " << pa[i] << ", " << ta[i] << ", " << fs[i] << ", " << la[i] << std::endl;
    }
}

This is not legal, in spite of your comment:

x.resize(10);

  thrust::fill(x.begin(), x.begin() + 20, 10);

You cannot write to a vector beyond its size.

This is also illegal:

x.resize(10);
  ...
  thrust::copy(x.begin(), x.begin() + 20, std::ostream_iterator<int>(std::cout, ","));

You’re not allowed to access a vector beyond it’s size.

Furthermore, newly created vector container elements are required to be zero’d, whether at initial allocation or by resize() to a larger size.

Anyway, why should we argue about it?

Why not do this:

x.reserve(100);
  x.resize(100);

?

You can later resize it to 87, if you want to. In that case, elements between 64 and 87 will not be zero’ed.

As another way of how your process is broken, consider this:

I reserve 100 spaces for my vector.
I resize to 10.
Then I write 20 elements. (this is illegal)
Then I do a .push_back()

By definition, element 10 that I wrote previously out-of-bounds is now overwritten by the push_back() operation.

Vectors are not intended to be written beyond their size. None of the methods (.end(), .size(), etc.) expect you to do this, and many will not have sensible behavior if you do. It’s that simple, and you should revise your thinking and your algorithm.

You’re right. Thank you for your wisdom. I was using vectors for their convenience instead of creating my own structure of arrays.

I realize now that my code would be a lot cleaner if I encapsulated everything properly. Initially, I was just being lazy about my own class design but it’s become clear that it’s time to create a new class and use that for my needs.

Thank you for the advice!

Edit : You brought up a lot of really good points about why thrust::device_vector was not the container I should be using.