I met problem in using double pointer in CUDA. The input is a a stack of 2D arrays or a 3D array, its size RowsColsn_A, I want to store it on device using a set of 2D arrays (flatten in linear memory). My code fragment is just like this:
// on device memory
cuComplex** d_ptrA;
cuComplex** d_ptrB;
.....
// evaluate h_A (data on host memory, its a stack of 2D arrays or a 3D array, its size Rows*Cols*n_A)
....
// allocate double pointer
cudaMalloc((void**) &d_ptrA, n_A*sizeof(cuComplex*));
checkCudaError("at cudaMalloc1");
// allocate a set of 2D cuda array to store the first matrix stack
for (int i = 0; i < n_A; i++)
{
cudaMalloc((void**)d_ptrA[i], Rows*Cols * sizeof(cuComplex));
checkCudaError("at cudaMalloc2");
cudaMemcpy(d_ptrA[i], h_A+i*Rows*Cols,Rows*Cols*sizeof(cuComplex),cudaMemcpyHostToDevice);
checkCudaError("at cudaMemcpy1");
}
But I always receive segment violation which cause Matlab crashes. Can anyone help me out?
Thank you for your reply but I think you misunderstood me. I allocate d_ptrA to store many pointers. That is , its elements are pointers. That is why cuComplex** d_ptrA.
Yes, I misunderstood. Dereferencing a pointer to device memory on the host side will not work, however. The cudaMalloc() call delivers a pointer to device memory, into host-side storage. d_ptrA[i] is device-side storage.
So you mean I should store those pointers in host side?
I donot know whether it makes sense:
cuComplex** ptrA;
ptrA = (cuComplex**)malloc(sizeof(cuComplex*)*n_A);
// allocate a set of 2D cuda array to store the first matrix stack
for (int i = 0; i < n_A; i++)
{
cudaMalloc((void**)d_ptrA, Rows*Cols * sizeof(cuComplex));
checkCudaError("at cudaMalloc3");
cudaMemcpy(d_ptrA,h_A+i*Rows*Cols,Rows*Cols*sizeof(cuComplex),cudaMemcpyHostToDevice);
checkCudaError("at cudaMemcpy1");
ptrA[i] = d_ptrA;
}
Oh, that is some reason that I try to use a set of arrays.
I have two sets of images as input. At first, I read these two sets of images to two 3D textures, and then read a pair of images from the two 3D textures to do some rotation, fft…
But you know 3D texture has a size limit: 204820482048 which means if the image number in a set>2048, error will happen. So I want to try some other ways.
I have tried using continuous piece of memory and read one image size out and bind to a 2D texture in loop… but much slower than 3D texture method.
So I want to try allocating a set of arrays or a set of 2D textures to sore a set of images. But needs to store their addresses, that is why double pointer… but always fails.
Maybe I can try using a kernel to fill the arrays just as you said.