Hello,
I’m trying to debug my cuda program but i look very unpractical to me to do as it is multithreaded as you have to select the warps, and also have two debugger if you want to debug both cpu and gpu at the same time. Also I searched for a single threaded debugging mode but it doesn’t seems to exist then I decided to modify the functions when I am in debugging configuration by adding #ifdef _DEBUG lines.
Like that :
#ifndef DEBUG
__global__
#endif
void add(int n, float *x, float *y)
{
int index = threadIdx.x;
int stride = blockDim.x;
for (int i = index; i < n; i += stride)
y[i] = x[i] + y[i];
}
And :
int blockSize = 256;
int numBlocks = (N + blockSize - 1) / blockSize;
ifdef _DEBUG
gridDim.x = numBlocks;
blockDim.x = blockSize;
for (threadIdx.x = 0; threadIdx.x < blockSize; threadIdx.x++)
for (blockIdx.x = 0; blockIdx.x < numBlocks; blockIdx.x++)
add(N, x, y);
#else
add<<<numBlocks, blockSize>>>(N, x, y);
cudaDeviceSynchronize();
#endif // _DEBUG
It could work but the compiler complain about gridDim.x, blockDim.x, threadIdx.x and blockIdx.x not beeing editable.
I tried that :
#ifdef _DEBUG
#define __global__
uint3 threadIdx;
uint3 blockIdx;
uint3 blockDim;
uint3 gridDim;
#else
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#endif
But it doesn’t like this anymore giving a declaration is incompatible with “const uint3 threadIdx”. My c++ knowledge is not enough extended to find the solution.
I could change all my kernels like that :
void add(int n, float *x, float *y)
{
#ifdef _DEBUG
int index=mythreadidxx;
int stride=myblockdimx;
#else
int index = threadIdx.x;
int stride = blockDim.x;
#endif
for (int i = index; i < n; i += stride)
y[i] = x[i] + y[i];
}
But it would make the code heavier again.
Thank you in advance.