Hi,
I’m a beginner in CUDA programing and I’m trying to program the Radon transform on the GPU. It’s the sum of the intensities of the image along an axis defined by an angle theta from the center of the image. I calculate these index in the CPU then i use them in the GPU. The problem is that i’m using a conditional on the value of the threadIdx.x in a loop, and I doubt that this is the reason of launch failure :
cudaSafeCall() Runtime API error in file <radonKernel.cu>, line 231 : unspecified launch failure.
the code:
/*****************************************************************************************
// kernel
global void radon_vrs3( float* img_In, float* radonResults, int* index, int N, int M )
{
// Declare rows of the matrix to be in shared mamory for speed
shared float sum[256];
// Calculate which element this thread reads from memory
//int index = M * blockIdx.x + threadIdx.x; for radon_vrs2
int idx = M * blockIdx.x + threadIdx.x;
for (int ii = 0; ii < N*M; ii++)
{
if ( idx == index[ii] ) // index is the vector of the correct indexes to read according the value of theta
sum[threadIdx.x] = img_In[idx];
}
__syncthreads();
int nTotalThreads = blockDim.x; // Total number of active threads
while (nTotalThreads > 1)
{
int halfPoint = (nTotalThreads >> 1); // divide by two
// Only the first half of threads will be active.
if (threadIdx.x < halfPoint)
sum[threadIdx.x] += sum[threadIdx.x + halfPoint];
__syncthreads();
nTotalThreads = (nTotalThreads >> 1); // divide by two.
}
// At this time, each thread(0) has a sum of a row
// It's time for each thread(0) to write it's final result.
if (threadIdx.x == 0)
radonResults[blockIdx.x]=sum[0];
}
/*********************************************************************************************************
Can anyone help ??