How to convert this complex nested for loop to GPU code?

Hello everyone,

I want to parallelize and convert this nested for loop to GPU code:

for (row = (minRow - m_nWidth); row<maxRow; row++)
	{
		for (col = (minCol - m_nWidth); col<maxCol; col++)
		{
			
                        ....

		}
	}

I wrote this code and it’s not working:

int idx = blockIdx.x * blockDim.x + threadIdx.x;
	int idy = blockIdx.y * blockDim.y + threadIdx.y;
	
	row = idx;
	col = idy;

	if (row >= (minRow - m_nWidth) && row < maxRow)
	{
		if (col >= (minCol - m_nWidth) && col < maxCol)

		{
			.....

		

		}
	}
}

// Call

void CurveDetParA::parallel_CDA(int row, int col, int minRow, int minCol, int maxRow, int maxCol, int *m_ngI, int *m_ngpI, int *m_ngppI, int counter, 
	int m_nWidth, int nIndexl, int nIndexu, int lPixel, int uPixel, int nG, int nGP, int nGPP, int *m_ng, int *m_ngp, int *m_ngpp, IMAGEDATA *m_pImage)
{
dim3 dimBlock(1, 1, 1);
dim3 dimGrid(m_nWidth, m_nWidth, 1);

parallelCD1 << <dimGrid, dimBlock >> > (row, col, minRow, minCol, maxRow, maxCol, m_ngI, m_ngpI, m_ngppI, counter, m_nWidth, nIndexl, nIndexu, lPixel, uPixel, nG, nGP, nGPP, m_ng, m_ngp, m_ngpp, m_pImage);

}

Any help will be appreciated. TIA!