Hi All!

I am beginner in CUDA and have been trying to write a kernel that finds the absolute value of a complex number.

I have an array of 1024x1024 and based on the matrixmultiplication examples and CUDA videos, I am using 64x64 blocks with BLOCK_SIZE of 16.

This code is taking about 28ms which is rather long when compared to other complex operations (like a complex matrix multiplication of 2 1024x1024 matrixes that takes about 1ms).

This is the kernel I have so far

[codebox]// setup execution parameters

```
dim3 threads2(BLOCK_SIZE, BLOCK_SIZE);
dim3 grid2(COLUMNS / threads.x, ROWS / threads.y);
```

// execute the kernel

```
abs_complex<<< grid2, threads2 >>>(d_image_buff,d_result_buff,COLUMNS);
```

**global** void

abs_complex( float* A, cuComplex* B, int Width)

{

```
// Block index
int bx = blockIdx.x;
int by = blockIdx.y;
```

// Thread index

```
int tx = threadIdx.x;
int ty = threadIdx.y;
```

//Calculating the position of the element that will be converted

```
// Using columkn-major ordering
int pos = (by*BLOCK_SIZE + ty)*Width + (bx*BLOCK_SIZE + tx);
A[pos] = sqrt(B[pos].x*B[pos].x + B[pos].y*B[pos].y);
```

}

#endif // #ifndef *REAL_TO_COMPLEX_KERNEL_H*[/codebox]

I do not know how to do it faster. Any hint/advice/idea would be greatly appreciated.

Thanks a lot in advance!