I’m working on a project where I need to use different sizes of arrays in one array. That’s why I created the array in different sizes while creating it. Below are the loops that I malloc and copy smoothly.
'double*** d_A = new double**[Asize];
for (int i = 0; i < Asize; i++) {
int twoSize = A[i][0][3];
d_A[i] = new double*[twoSize];
for (int j = 0; j < twoSize; j++) {
d_A[i][j] = new double[4];
}
}
for (int i = 0; i < Asize; i++) {
int twoSize = A[i][0][3];
for (int j = 0; j < twoSize; j++) {
cudaMalloc(&d_A[i][j], 4 * sizeof(double));
}
}'
Do I need to call the global function I will use now as ‘d_A’ or as ‘& d_A’?
Wow, fun. You pretty much never pass &
of anything as a CUDA kernel argument. The address of a host variable is always a pointer to host memory. That is never usable in CUDA device code. The only possible exception would be if using managed memory.
Here is a worked example of how to do it (the second code example in that answer). That isn’t actually demonstrating variable row length, but I assume you can figure that part out.
For a collection of other approaches to multi-dimensional array handling, see here.
Thank you for your response. Is there anything wrong with this code? I don’t get any errors when I run it.
In other parts of the code, the data comes in fragments. Fragmented and different in size. Is there any other way I can do this?
Although I tried both ‘d_A’ and ‘d_A’ when I called the function, the global function was not called.
Yes, there are problems with your code. The first is that your d_A
pointer (if that is what you are passing to the kernel) cannot be allocated with new
. No pointer that you dereference in device code can be allocated with new
. I suggest studying the link I gave you. It’s a complex procedure to make this work.
Regarding other method, I have you a link for some suggestions. The most canonical (and probably best) suggestion is to flatten your data. For uneven length rows, this means you’ll need an array of row start offsets. Something like this
why don’t I get a error when I’m working? And why does it take up space in video card memory d_A?
If your code is working to your satisfaction then full speed ahead! No need to ask me about it. I assumed you were having trouble based on statements like this:
If you want to see the errors that are resulting in your global function not being called, use proper CUDA error checking (just google that) and run your code with cuda-memcheck
. If neither of those report errors, then that is good.
And anytime you do a cudaMalloc
operation, it takes up space in video card memory. Even if your approach is wrong.
You got it right. I think I misrepresented it. :)
So I may think the code is working correctly. I’ll review the response in the link and try to integrate it in my own way. Thank you for your support. :)
Should I use a ‘malloc’ operator instead of a ‘new’ operator when I want to create an array?
How can I assign a value to the code in this link in main? the link
I don’t know what “assign a value to the code” means. The kernel in that example demonstrates how to assign a value to the 3D array in device code.
#include <cstdio>
inline void GPUassert(cudaError_t code, char * file, int line, bool Abort = true)
{
if (code != 0) {
fprintf(stderr, "GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
if (Abort) exit(code);
}
}
#define GPUerrchk(ans) { GPUassert((ans), __FILE__, __LINE__); }
__global__ void doSmth(int*** a) {
//int threadID = blockDim.x * blockIdx.x + threadIdx.x;
for (int i = 0; i < 2; i++)
for (int j = 0; j < 2; j++)
for (int k = 0; k < 2; k++)
printf("[%d][%d][%d]=%d\n", i, j, k, a[i][j][k]);
}
int main() {
int*** h_c = (int***)malloc(2 * sizeof(int**));
for (int i = 0; i < 2; i++) {
h_c[i] = (int**)malloc(2 * sizeof(int*));
for (int j = 0; j < 2; j++)
GPUerrchk(cudaMalloc((void**)&h_c[i][j], 2 * sizeof(int)));
}
int ***h_c1 = (int ***)malloc(2 * sizeof(int **));
for (int i = 0; i < 2; i++) {
GPUerrchk(cudaMalloc((void***)&(h_c1[i]), 2 * sizeof(int*)));
GPUerrchk(cudaMemcpy(h_c1[i], h_c[i], 2 * sizeof(int*), cudaMemcpyHostToDevice));
}
for (int i = 0; i < 2; i++)
for (int j = 0; j < 2; j++)
for (int k = 0; k < 2; k++)
h_c[i][j][k] = i + j + k;
int*** d_c;
GPUerrchk(cudaMalloc((void****)&d_c, 2 * sizeof(int**)));
GPUerrchk(cudaMemcpy(d_c, h_c1, 2 * sizeof(int**), cudaMemcpyHostToDevice));
doSmth << <1, 1 >> > (d_c);
GPUerrchk(cudaPeekAtLastError());
int res[2][2][2];
for (int i = 0; i < 2; i++)
for (int j = 0; j < 2; j++)
GPUerrchk(cudaMemcpy(&res[i][j][0], h_c[i][j], 2 * sizeof(int), cudaMemcpyDeviceToHost));
}
I wanted to say “assign a value to the array”. like the code above