I have a question about a simple kernel I am trying to write that I am not understand why is failing. I am new to CUDA and this is the first time trying to write a kernel (have used FFT and CUBLAS libraries thus far).
Is there a way to debug a kernel?
I want to basically make a cuComplex matrix out of a float matrix by assigning the real values of the float matrix to the .x part of the cuComplex matrix.
Currently, I am sending the vector back to host memory and doing it at host and this process is taking about 20ms (on a 1024*1024 array) which is way too long for my application. I understand that CUBLAS uses column-major order, but my code should still at least work and not just crash.
Here is what I have been trying to do: (on a 1024 x 1024 array… ROWS=COLUMNS)
// Thread block size
#define BLOCK_SIZE 16
// setup execution parameters
dim3 threads(BLOCK_SIZE, BLOCK_SIZE); dim3 grid(COLUMNS / threads.x, ROWS / threads.y);
// execute the kernel
realToComplex<<< grid, threads >>>(d_image_buff,d_image_complex_buff,COLUMNS,ROWS);
realToComplex( float* A, cuComplex* B, int Width, int Height)
// Block index int bx = blockIdx.x; int by = blockIdx.y;
// Thread index
int tx = threadIdx.x; int ty = threadIdx.y;
//Calculating the position of the element that will be converted
int pos = BLOCK_SIZE * bx + BLOCK_SIZE * BLOCK_SIZE * Width * by + tx + BLOCK_SIZE * Width * ty; B[pos].x = A[pos]; B[pos].y = 0.0f;
There is no a descriptive error coming back. This is what I receive:
cudaThreadSynchronize error: Kernel execution failed in file <c:/C Projects/matrixMul/matrixMul.cu>, line 235 : unspecified launch failure.
Any help debugging this would be appreciated
Thanks for your help!