I just noticed that cuda-gdb didn’t follow the path of normal execution as I was debugging a cuda kernel. It seems to me that the out-of-order statements were only shown, rather than executed. Is it a known issue?
Hello Ziqi - this is not a known issue with CUDA GDB. Did you compile your CUDA kernel with the “-G” option to the compiler? If the CUDA kernel is optimized by the compiler, you can see some surprising and counterintuitive behavior in the debugger while stepping through the optimized code.
If you’re compiling with -G and still observing unexpected behavior, it would be helpful to describe your issue in greater detail, so that we can review your issue and attempt to reproduce it.
Hi Steveu,
I indeed built with -G.
I can only share the code where I observed the problem (as the whole kernel involves IP).
while (1) // loop over distances of integer pitches
{
distToCell = range * pitch;
if (y + distToCell > paddedCA.bottom + MIN_ALIGN_THRESHOLD && y - distToCell < paddedCA.top - MIN_ALIGN_THRESHOLD) break;
if (range == 0)
{
pInterp[(threadIdx.y * blockDim.x + threadIdx.x) * maxNumCandidates + numValidCandidate] = pCurrFrame[y * width + x];
numValidCandidate++;
if (numValidCandidate == maxNumCandidates) break;
range++;
continue;
}
// search in the negative direction
candidateLoc = y - distToCell;
itpCtr = floor(candidateLoc);
if (candidateLoc > paddedCA.top - MIN_ALIGN_THRESHOLD && candidateLoc + 1 < paddedCA.bottom + MIN_ALIGN_THRESHOLD)
{
if (candidateLoc - itpCtr < MIN_ALIGN_THRESHOLD)
{
pInterp[(threadIdx.y * blockDim.x + threadIdx.x) * maxNumCandidates + numValidCandidate] = pCurrFrame[itpCtr * width + x];
}
else
{
if (ceil(candidateLoc) - candidateLoc < MIN_ALIGN_THRESHOLD)
{
pInterp[(threadIdx.y * blockDim.x + threadIdx.x) * maxNumCandidates + numValidCandidate] = pCurrFrame[(itpCtr + 1) * width + x];
}
else
{
itpdValue = 0.0;
for (int k = -radius; k <= radius; k++)
{
itpdValue += pCurrFrame[(itpCtr + k) * width + x] * pHamSincCoeff_c[(-range + maxNumCandidates) * wlen + (k + radius)];
}
pInterp[(threadIdx.y * blockDim.x + threadIdx.x) * maxNumCandidates + numValidCandidate] = int16_t(itpdValue + 0.5);
}
}
numValidCandidate++;
if (numValidCandidate == maxNumCandidates) break;
}
candidateLoc = y + distToCell;
itpCtr = floor(candidateLoc);
if (candidateLoc > paddedCA.top - MIN_ALIGN_THRESHOLD && candidateLoc + 1 < paddedCA.bottom + MIN_ALIGN_THRESHOLD)
{
if (candidateLoc - itpCtr < MIN_ALIGN_THRESHOLD)
{
pInterp[(threadIdx.y * blockDim.x + threadIdx.x) * maxNumCandidates + numValidCandidate] = pCurrFrame[itpCtr * width + x];
}
else
{
if (ceil(candidateLoc) - candidateLoc < MIN_ALIGN_THRESHOLD)
{
pInterp[(threadIdx.y * blockDim.x + threadIdx.x) * maxNumCandidates + numValidCandidate] = pCurrFrame[(itpCtr + 1) * width + x];
}
else
{
itpdValue = 0.0;
for (int k = -radius; k <= radius; k++)
{
itpdValue += pCurrFrame[(itpCtr + k) * width + x] * pHamSincCoeff_c[(-range + maxNumCandidates) * wlen + (k + radius)];
}
pInterp[(threadIdx.y * blockDim.x + threadIdx.x) * maxNumCandidates + numValidCandidate] = int16_t(itpdValue + 0.5);
}
}
numValidCandidate++;
if (numValidCandidate == maxNumCandidates) break;
}
range++;
}
I used n
to debug the chunk of code step-by-step. The continue
under if (range == 0)
should only be hit once. Further range++
at the end of the while loop should only be hit once in each loop. However, I observed the continue
under if (range == 0)
was hit repetitively, although it was not executed. The order of range++
at the end was not right either. It seems to me that it was only a display issue, which didn’t influence result. But it indeed impacts debug experience.
Hi @Ziqi
Can you share the debug session log?
- Use
n
to debug the code (as you done before) - When you hit the breakpoint, please enter
info cuda warps
Multiple breakpoint hits might be caused by divergence (i.e. different warps are hitting the same breakpoint).
Additional details can also be found here: No matter how many warps there are, only one warp hits the breakpoint, right? - #9 by AKravets