hi.i Would like to process Function_1 on cuda. Like this:
but when I’m running my_kernel ,I get nothing!!! How to use loop_1 and loop_2 in kernel?
Please help me!
kernel:
# define F 20
__global__ void kernel(double* dev_fitness,double* dev_prob)
{
int i = blockIdx.x;
double maxfit;
maxfit=dev_fitness[0];
if(i<F)
{
if(dev_fitness[i]>maxfit)
maxfit=dev_fitness[i];
}
if(i<F)
{
dev_prob[i]=(0.9*(dev_fitness[i]/maxfit))+0.1;
}
}
//Function_1:
[code]void Function_1()
{
int i;
double maxfit;
maxfit=fitness[0];
//loop_1
for (i=1;i<F;i++)
{
if (fitness[i]>maxfit)
maxfit=fitness[i];
}
//lopp_2
for (i=0;i<F;i++)
{
prob[i]=(0.9*(fitness[i]/maxfit))+0.1;
}
}
thank you
I searched a lot, about 2d array in cuda and I read nvidia c++ guide, to find something about 2d array and I only get to allocate memory I should use “cudamallocpitch” and dim3
But I don’t know how to convert this code to cuda!
Please explain me how can I do this?
void main()
{
int F=40
int D=80
int i,j;
double GlobalParams[D]
double Foo[F][D]
for(i=0;i<F;i++)
//only second for to convert cuda
if(...)
{
for (j=0;j<D;j++)
GlobalParams[j]=Foo[i][j];
}
}
If you have a 2D aray depends on what are you using it for. In my codes it was enough to map it to a 1D array such as a[i][j]–>dev_a[i+j*lx]. In CUDA 2d arrays have a special meaning. They are optimized for 2d access such all neighbours of an element [i,j]. For your code above I am not sure what do you want to do.
I got it now. I have no idea how to use the malloc pitch, but you can use something else whichworked for me. Define the 2D matrix as an array of pointers
double *foods[F],*dev_food[F];
// now allocate the memory on host and gpu with a loop
for(int istr=0;istr < F;istr++)
{
cudaHostAlloc(&food[istr],sizeof(double)*D,cudaHostAllocDefault);
cudaMalloc((void**)&dev_food[istr[istr],D*sizeof(double));
}
// on host you can access the elements of food as usually with [i][j]
// another loop of rthe copying of data
for(int istr=0;istr < F;istr++)
{
cudaMemcpy(dev_Foods[istr],Foods[istr],sizeof(double)*D,CudaMemcpyHostToDevice);
}
// now you can make the loop
for(int i=0; i < F; i++)
{
// something ...
newFunc_l< < < (D-1+ntbp)/ntbp,ntpb > > > (dev_GlobalParams,dev_Foods[i],D);
}
You have to define the number of threads per block ntbp
The new kernel is below:
I do not understand.!!!
I’m really confused.I think nvidia need a specific compiler! that programming get easier.
Would you please show me another example.show me foe example we have tow array in host named h_array1and h_array2 and we want copy h_array1 to h_array2 by cuda.size of them is [10][15]
You can not define a 2D array on gpu similar to the cpu version. If on cpu you define cpu_arrray[F][D], you can not do on gpu gpu_array[F][D]. Practical there are no higher dimensional arrays on gpu. All arrray are mapped to a 1D array. So a matrix of [1:F]x[1:D] size will be defined as a 1D array of [1:F*D]. If you need to work with line like in you case you can define F pointers each pointers pointing to an array of size D. My code works for what you showed so far.
good lord, 2D array pointers is a disaster in CUDA, not only confusing, but wasteful in resources. It’s a lot better to just flatten the array into 1D indexing like so: