Up until now, I have had arrays of both host_vector’s and device_vector’s whose sizes are defined at compile time, e.g.:
thrust::host_vector<int> h_vec[100];
thrust::device_vector<int> d_vec[100];
/* ... populate each h_vec[i] ... */
for (i=0; i < 100; i++)
d_vec[i] = h_vec[i];
/* ... use d_vec in kernel ... */
and this works fine. But what I want to do now is allocate the arrays with a variable N but I am getting null pointers in d_vec regardless of the approach (though host_vector allocates and indexes as expected). I am likely misunderstanding how thrust::device_vector works, and one method I have not yet tried is to use an array of device_vector pointers (thrust::device_vector **d_vec) but then I would have to change the kernel parameters as well, and would prefer to avoid this if possible. Is there a preferred method for doing this, or should everything just be flattened to a 1D array of arrays?
CUDA (kernels) don’t know anything about thrust device_vector, so I don’t think it’s possible to pass a device vector to a kernel and do anything meaningful with it. Perhaps you should show a more complete code that demonstrates how you “use d_vec in kernel”
It creates an array of pointers to each of the device_vector’s using thrust::raw_pointer_cast(), and then uses a second pointer to that newly-created array to memcpy over to the GPU. Thanks goes to this post for the code. Ignore the methodology for populating h_vec_len for now; for now it’s just used to illustrate how the kernel can properly iterate each d_vec vector.
I attempted to build a code around the bits and pieces you have shown. It seems to work correctly for me.
$ cat t685.cu
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <stdlib.h>
#include <iostream>
#define NV 4
#define ND 4
__global__ void printkernel(int **data, int *lens, int sz){
for (int i = 0; i < sz; i++)
printf("%d\n", data[i][lens[i]-1]);
}
int main(int argc, char *argv[]){
int N = NV;
if (argc > 1) N = atoi(argv[1]);
thrust::host_vector<int> h_vec[N];
thrust::device_vector<int> d_vec[N];
int h_vec_len[N];
for (int i = 0; i < N; i++){
h_vec_len[i] = 0;
for (int j = 0; j < ND+i; j++){
h_vec[i].push_back(i*100+j);
h_vec_len[i]++;}}
thrust::device_vector<int> d_vec_len(h_vec_len, h_vec_len+N);
int *d_vecp[N];
for (int i = 0; i < N; i++){
d_vec[i].resize(h_vec[i].size());
thrust::copy(h_vec[i].begin(), h_vec[i].end(), d_vec[i].begin());
d_vecp[i] = thrust::raw_pointer_cast(d_vec[i].data());}
int **d_vecpp;
cudaMalloc(&d_vecpp, N*sizeof(int*));
cudaMemcpy(d_vecpp, d_vecp, N*sizeof(int*), cudaMemcpyHostToDevice);
printkernel<<<1,1>>>(d_vecpp, thrust::raw_pointer_cast(d_vec_len.data()), N);
cudaDeviceSynchronize();
return 0;
}
$ nvcc t685.cu -o t685
$ cuda-memcheck ./t685
========= CUDA-MEMCHECK
3
104
205
306
========= ERROR SUMMARY: 0 errors
[bob@cluster1 misc]$ cuda-memcheck ./t685 5
========= CUDA-MEMCHECK
3
104
205
306
407
========= ERROR SUMMARY: 0 errors
$
If you’re still having trouble, I suggest you post a short, complete code, that someone else could copy, compile, and run, and see the problem, without having to add anything or change anything. You might also want to be sure to use proper cuda error checking throughout your code (I have omitted it for brevity of presentation, running with cuda-memcheck is usually a suitable proxy. If you’re not sure what proper cuda error checking is, google “proper cuda error checking” and take the first hit.)
It looks like my issue had to do with uninitialized h_vec_len values, which I was blindly getting away with in the constant version. Now everything works as expected, thank you!