Hi all,
I have 2 device funtions:
__device__ float xtheta(int x, int y, float theta){
float xt = x*cos(theta)+y*sin(theta);
return xt;
}
__device__ float ytheta(int x, int y, float theta){
float yt = y*cos(theta)-x*sin(theta);
return yt;
}
And a Kernel like that:
__global__ void complex_filter_responses(float *real_filter, float lambda, float sigma_x, float sigma_y, int number_of_orientations)
{
int tx = threadIdx.x+blockIdx.x*blockDim.x;
int ty = threadIdx.y+blockIdx.y*blockDim.y;
int offset = tx+ty*blockDim.x*gridDim.x;
int filter_size=blockDim.x*gridDim.x*blockDim.y*gridDim.y;
for(int i=0;i<number_of_orientations*2;i=i+2){
float x_theta = xtheta(tx-(blockDim.x*gridDim.x)/2,ty-(blockDim.x*gridDim.x)/2,M_PI/3);
float y_theta = ytheta(tx-(blockDim.x*gridDim.x)/2,ty-(blockDim.x*gridDim.x)/2,M_PI/3);
real_filter[offset+i*filter_size]=x_theta*y_theta;
real_filter[offset+(i+1)*filter_size]= x_theta;
}
}
This code was working just fine.
However, I now need to add additional calculations to the Kernel so replaced
real_filter[offset+i*filter_size]=x_theta*y_theta];
with
real_filter[offset+i*filter_size]=exp(-0.5*(pow(x_theta,2)/pow(sigma_x,2)+pow(y_theta,2)/pow(sigma_y,2)))*cos(2*M_PI/lambda*x_theta+0);
Replacing this single line will cause an
cudaError_enum at memory location
So my first guess was that I have somewhere a division with 0 (which is unlikely because i doesn’t impact the equation) so I tried to run the code without the for-loop:
real_filter[offset]=exp(-0.5*(pow(x_theta,2)/pow(sigma_x,2)+pow(y_theta,2)/pow(sigma_y,2)))*cos(2*M_PI/lambda*x_theta+0);
which worked well.
Now I tried if I had a index-out-of-bound error but this wasn’t the case cause I could easily write to all elements from real_filter[0]<->real_filter[offset+(i+1)*filter_size]
without any errors.
Also
for(int i=0;i<number_of_orientations*2;i=i+2){
float x_theta = xtheta(tx-(blockDim.x*gridDim.x)/2,ty-(blockDim.x*gridDim.x)/2,M_PI/3);
float y_theta = ytheta(tx-(blockDim.x*gridDim.x)/2,ty-(blockDim.x*gridDim.x)/2,M_PI/3);
real_filter[offset+i*filter_size]=0.0;
}
worked without any errors.
[b]
Summing up: The error only occurs if I’m recalculating the value like this:[/b]
real_filter[offset]=exp(-0.5*(pow(x_theta,2)/pow(sigma_x,2)+pow(y_theta,2)/pow(sigma_y,2)))*cos(2*M_PI/lambda*x_theta+0);
I’m still veeery new tu cuda AND c so please be patient. I bet I just made a stupid mistake :P
Thanks in advance!