Double loop in a kernel

Hi,
I would like to make the equivalent of a double loop in on of my kernel for a hilbert transform process.
My kernel:
global void HilbertEnvelope_kernel (float* rfData_d, float* rfData_i)
{
long samples = blockIdx.yblockDim.y + threadIdx.x;
long idx = blockIdx.x
blockDim.x + threadIdx.x;
if(idx<nmlne)
{
if((samples>HILBERT_WINDOW-1)&&(samples<nmpts-HILBERT_WINDOW))
{
int k;
float s0;
s0=0.0;
for(k=1;k<HILBERT_WINDOW;k+=2)
{
s0 += (rfData_d[idxnmpts +samples +k]-rfData_d[idxnmpts +samples -k])/k;
}
rfData_i[samples]= 2*(s0/PI);
rfData_d[idxnmpts +samples] =
sqrt((rfData_i[samples]rfData_i[samples])+(rfData_d[idxnmp
t +samples]rfData_d[idxnmpts +samples]));
}
else
{
rfData_d[idx
nmpts +samples] = sqrt(rfData_d[idx*nmpts +samples]rfData_d[idxnmpts +samples]);
}
__syncthreads();
}
}

The idx is running well but it seems that samples don’t have the good numbers of iteration.
I just want to know if I can make transform double loop for runnig it inside a kernel.