I’m new user of CUDA, so sorry if my question looks stupid. I read several topic in this forum by I didn’t see the solution :-(
I’m transforming a c code in cuda code. The “kernel” algorithm uses dynamic programming (each new value needs previous computations). Each thread performs this algorithm and constructs a matrix (N*M). Both lengths depends on the input data. N is the same for all threads, but M is different.
So I want allocate T arrays of size (N*M) of float on the device. In the following code, N is the “input_size”, M is given by the “reference_size” array and T corresponds to the “number_reference”. For arrays allocation no problem (if I’ve not made mistakes) :
[codebox]
float* compute_score(float2** references_vec, float2* input_vec, int number_reference, int* reference_size, int input_size)
[…]
for(int i = 0; i < number_reference; i++) {
float* devPtr;
int pitch;
cudaMallocPitch((void**)&devPtr, &pitch,reference_size[i]*sizeof(float),input_size);
}[/codebox]
or something like that.
My problem is how to store the pointers “devPtr” in an array in order to each thread can find its array. (I guess that a float** array declared in host code can not store device pointers ).
I read that cudaGetSymbolAddress works only for “device” variables.
One solution is to use 3D array, but it means a huge loss of memory.
Thank you, it looks working (I have to try on my complete code).
I have one more question :
I tried the following code :
[codebox]device float** devPtr;
global myFunc() {
float* tmp = devPtr[num_thread]; //num thread is well calculated
...
}
void init() {
//int size well calculated
cudaMalloc((void**)&devPtr, size);
float** tmp = (float**) malloc(size);
for(int i = 0; i < N; i++) {
...
}
cudaMemcpy(devPtr,tmp...)
...
myFunc<<<dimGrid,dimBlock>>>();
...
}[/codebox]
My problem is that in the kernel function, devPtr[num_thread] return seg fault (I work in emulation mode). Apparently the devPtr is set to (nil) in kernel, and I don’t understant why : cudaMalloc returns no error and the pointer exists.
If I put devPtr as parameter of kernel function, all works. So apparently devPtr isn’t initialized in global scope. Maybe I made a mistake, but I thought that device float** devPtr was global and persistent.