Hi, I’m new with CUDA development and I have a basic question.
In the C original code implementation I have a double integer pointer, used to store N lists of values, each of them with a difference lenght. How is it possible to have a CUDA version similar to these line?
[codebox]
int **descs;
descs = (int**) malloc(Nsizeof(int));
for (int i=0;i<N;++i) {
descs[i] = (int*) malloc(sizeof(int)*nd[i];
}
[/codebox]
where nd[i] is set somewhere.
I tried this:
[codebox]
int **descs;
cudaMalloc((void **) &descs,Nsizeof(int));
for (int i=0;i<N;++i) {
cudaMalloc((void**) &descs[i],sizeof(int)*nd[i]);
}
[/codebox]
receiving a segfault :-(
There is a method to do this? or I can only allocate 1D array?
What you’re using is called jagged arrays. They do work in CUDA, but remember to copy the pointer list from the host to the device before use. Remember that the device cannot access the pointer list which still resides in host memory (the kernel would crash!). Your CudaMallocs currently write device memory pointers an array in host memory.
Overall these jagged arrays are a pain to set up in CUDA, and the overhead of doing a lot of tiny memory allocations (and potentially copy operations) is tremendous. Consider CudaArrays or single 1D representation of a 2D array in memory (manual computation of indexes in the kernel)
I put a 5 star in this topic. In my case this is the only case of building beautiful code. In my case I have to update a master vector with new elements which are only known at runtime. So the lame way is to destroy the master matrix and rebuild it with the newly added elements. This is extremely lame. The way is the double pointer and then use the double pointer to build the master vector.