cuPrintf limited to grids of up to 2048 threads

[font=“arial, verdana, tahoma, sans-serif”]While trying to debug some other code, I ran into a limitation of cuPrintf library. Consider the following example:[/font]

[font=“arial, verdana, tahoma, sans-serif”] [/font]

#include <cuPrintf.cu>

#include <conio.h>

__global__ void test()

{

    unsigned idx = blockDim.x * blockIdx.x + threadIdx.x;

    if (idx%100==0) //this condition makes no difference

        cuPrintf("ThreadIndex: %d\n", idx);

}

int main( int argc, char ** argv )

{

    cudaPrintfInit();

    test<<< 8, 256 >>>(); //test<<< 9, 256 >>>(); produces no output

    cudaPrintfDisplay(stdout, true);

    cudaPrintfEnd();

    printf("Error state: %d\n", cudaGetLastError());

    getch();

    return 0;

}

[font=“arial, verdana, tahoma, sans-serif”] [/font][font=“arial, verdana, tahoma, sans-serif”] [/font][font=“arial, verdana, tahoma, sans-serif”]It would seem that cuPrintf style debugging is limited only to small grids. Now I have to either find a small input file, or generate it :D[/font]

[font=“arial, verdana, tahoma, sans-serif”]

[/font]

[font=“arial, verdana, tahoma, sans-serif”]Does anyone know whether this is hardware dependent? I got GTX280 and I am using cuPrintf from SDK 3.2.[/font]

[font=“arial, verdana, tahoma, sans-serif”] [/font]

[font=“arial, verdana, tahoma, sans-serif”]Dženan[/font]

You can give cudaPrintfInit() a number (bytes to reserve for output)

Increasing the reserved buffer size does the trick, thanks a lot.