Hi,
I want to analyse my CUDA kernel. Basically, I would like to check the values generated by some parameters in my kernel, that too block wise. For example, I would like to know what values are calculated by say threadIdx.x=67, in block 3 and so on. Any helpful tool for this.
Thank you very much
There is a source level debugger, cuda-gdb, supplied in all recent versions of the CUDA toolkits for Linux.
Hi,
I want to analyse my CUDA kernel. Basically, I would like to check the values generated by some parameters in my kernel, that too block wise. For example, I would like to know what values are calculated by say threadIdx.x=67, in block 3 and so on. Any helpful tool for this.
Thank you very much
For windows you can use Nexus, linux has a cuda-gdb.
Other than that (for release mode - i.e. run on the GPU) I usually use something like this (fastest and ugliest :) ):
__gloabl__ void MyKernel(....)
{
...
#ifdef CUDA_DEBUG
if ( ( blockIdx.x == 3 ) && ( threadIdx.x == 67 )
{
pDummyOutput[ 0 ] = fSomeValue;
pDummyOutput[ 1 ] = fSomeOtherValue;
....
}
#endif
....
}
And then on the host code, copy the pDummyOutput and print the values.
Fastest and ugliest ;)
Eyal
For windows you can use Nexus, linux has a cuda-gdb.
Other than that (for release mode - i.e. run on the GPU) I usually use something like this (fastest and ugliest :) ):
__gloabl__ void MyKernel(....)
{
...
#ifdef CUDA_DEBUG
if ( ( blockIdx.x == 3 ) && ( threadIdx.x == 67 )
{
pDummyOutput[ 0 ] = fSomeValue;
pDummyOutput[ 1 ] = fSomeOtherValue;
....
}
#endif
....
}
And then on the host code, copy the pDummyOutput and print the values.
Fastest and ugliest ;)
Thanks a lot for your replies!
Could you tell me do I need to pass “pDummyOutput” as one of the parameters in the kernel call in my host code like:
float cudaMalloc((void ** ) pDummyOutput....);
MyKernel<<< ....>>> ( pDummyOutput, x, y , etc );
Thanks a lot for your replies!
Could you tell me do I need to pass “pDummyOutput” as one of the parameters in the kernel call in my host code like:
float cudaMalloc((void ** ) pDummyOutput....);
MyKernel<<< ....>>> ( pDummyOutput, x, y , etc );
Yes ofcourse :)
Pass the dummy output and once the kernel is over - copy it to a host array and just print it.
Thanks. This is a very simple and straight forward way…Thanks once again External Media