Cuda-gdb bug?

I just noticed that cuda-gdb didn’t follow the path of normal execution as I was debugging a cuda kernel. It seems to me that the out-of-order statements were only shown, rather than executed. Is it a known issue?

Hello Ziqi - this is not a known issue with CUDA GDB. Did you compile your CUDA kernel with the “-G” option to the compiler? If the CUDA kernel is optimized by the compiler, you can see some surprising and counterintuitive behavior in the debugger while stepping through the optimized code.

If you’re compiling with -G and still observing unexpected behavior, it would be helpful to describe your issue in greater detail, so that we can review your issue and attempt to reproduce it.

Hi Steveu,

I indeed built with -G.

I can only share the code where I observed the problem (as the whole kernel involves IP).

while (1) // loop over distances of integer pitches
{
    distToCell = range * pitch;

    if (y + distToCell > paddedCA.bottom + MIN_ALIGN_THRESHOLD && y - distToCell < paddedCA.top - MIN_ALIGN_THRESHOLD) break;

    if (range == 0)
    {
        pInterp[(threadIdx.y * blockDim.x + threadIdx.x) * maxNumCandidates + numValidCandidate] = pCurrFrame[y * width + x];
        numValidCandidate++;

        if (numValidCandidate == maxNumCandidates) break;

        range++;

        continue;
    }
    
    // search in the negative direction
    candidateLoc = y - distToCell;
    itpCtr = floor(candidateLoc);

    if (candidateLoc > paddedCA.top - MIN_ALIGN_THRESHOLD && candidateLoc + 1 < paddedCA.bottom + MIN_ALIGN_THRESHOLD)
    {
        if (candidateLoc - itpCtr < MIN_ALIGN_THRESHOLD)
        {
            pInterp[(threadIdx.y * blockDim.x + threadIdx.x) * maxNumCandidates + numValidCandidate] = pCurrFrame[itpCtr * width + x];
        }
        else
        {
            if (ceil(candidateLoc) - candidateLoc < MIN_ALIGN_THRESHOLD)
            {
                pInterp[(threadIdx.y * blockDim.x + threadIdx.x) * maxNumCandidates + numValidCandidate] = pCurrFrame[(itpCtr + 1) * width + x];
            }
            else
            {
                itpdValue = 0.0;
                for (int k = -radius; k <= radius; k++)
                {
                    itpdValue += pCurrFrame[(itpCtr + k) * width + x] * pHamSincCoeff_c[(-range + maxNumCandidates) * wlen + (k + radius)];
                }
                pInterp[(threadIdx.y * blockDim.x + threadIdx.x) * maxNumCandidates + numValidCandidate] = int16_t(itpdValue + 0.5);
            }
        }

        numValidCandidate++;
        if (numValidCandidate == maxNumCandidates) break;
    }

    candidateLoc = y + distToCell;
    itpCtr = floor(candidateLoc);

    if (candidateLoc > paddedCA.top - MIN_ALIGN_THRESHOLD && candidateLoc + 1 < paddedCA.bottom + MIN_ALIGN_THRESHOLD)
    {
        if (candidateLoc - itpCtr < MIN_ALIGN_THRESHOLD)
        {
            pInterp[(threadIdx.y * blockDim.x + threadIdx.x) * maxNumCandidates + numValidCandidate] = pCurrFrame[itpCtr * width + x];
        }
        else
        {
            if (ceil(candidateLoc) - candidateLoc < MIN_ALIGN_THRESHOLD)
            {
                pInterp[(threadIdx.y * blockDim.x + threadIdx.x) * maxNumCandidates + numValidCandidate] = pCurrFrame[(itpCtr + 1) * width + x];
            }
            else
            {
                itpdValue = 0.0;
                for (int k = -radius; k <= radius; k++)
                {
                    itpdValue += pCurrFrame[(itpCtr + k) * width + x] * pHamSincCoeff_c[(-range + maxNumCandidates) * wlen + (k + radius)];
                }
                pInterp[(threadIdx.y * blockDim.x + threadIdx.x) * maxNumCandidates + numValidCandidate] = int16_t(itpdValue + 0.5);
            }
        }

        numValidCandidate++;
        if (numValidCandidate == maxNumCandidates) break;
    }

    range++;
}

I used n to debug the chunk of code step-by-step. The continue under if (range == 0) should only be hit once. Further range++ at the end of the while loop should only be hit once in each loop. However, I observed the continue under if (range == 0) was hit repetitively, although it was not executed. The order of range++ at the end was not right either. It seems to me that it was only a display issue, which didn’t influence result. But it indeed impacts debug experience.

Hi @Ziqi

Can you share the debug session log?

  • Use n to debug the code (as you done before)
  • When you hit the breakpoint, please enter info cuda warps

Multiple breakpoint hits might be caused by divergence (i.e. different warps are hitting the same breakpoint).

Additional details can also be found here: No matter how many warps there are, only one warp hits the breakpoint, right? - #9 by AKravets