Not so simple kernel

  1. Does this kernel has any problem ? Like too many control statements,not simple etc ?
    It crashes when the first ‘for’ loop iteration(ITERATION) is more than ~2000 on a GTX 280.
    CHUNKS=256. If ITERATIONS is less than ~2000 it works perfectly as desired.
    Any clues ? My intention is to do the same operation(as implemented in kernel) on different chunks of data

  2. Is character array properly supported on CUDA ? Or is it better to use 32 bit integer arrays ?

    int ChunkNo = blockIdx.x * blockDim.x + threadIdx.x;

    unsigned int TempCode,Length;
    unsigned int Flag,OutChars;
    unsigned int i,k,p,q;
    unsigned char j;
    unsigned char TempChar,Mask;

    if (ChunkNo < CHUNKS)
    {
    TempCode=0;
    Length =0;
    Flag=0;
    q=0;
    OutChars=0;
    for(i=0;i<ITERATIONS;i++)
    {
    j=idata[ChunkNo * LENGTH + i];
    Mask=0x80;
    k=0;
    while(k<8)
    {
    if(Flag)
    {
    TempChar=j&Mask;
    if(TempChar==Mask)
    {
    TempCode=TempCode<<1;
    TempCode=TempCode|0x00000001;
    }
    else
    {
    TempCode=TempCode<<1;
    }
    Length++;
    Mask=Mask>>1;
    k++;
    }
    else
    {
    while((q<MinCodeLength)&&(k<8))
    {
    TempChar=j&Mask;
    if(TempChar==Mask)
    {
    TempCode=TempCode<<1;
    TempCode=TempCode|0x00000001;
    }
    else
    {
    TempCode=TempCode<<1;
    }
    Length++;
    Mask=Mask>>1;
    k++;
    q++;
    }
    }

     		if(q>=MinCodeLength)
     		{
     			Flag=1;
     			for(p=0;p<TOTAL_SYMBOLS;p++)
     			{  
     				if(TempCode == d_CodeWordArray[p] && Length == d_CodeLengthArray[p])
     				{
     					TempCode=0;
     					Length =0;
     					odata[ChunkNo * LENGTH + OutChars] = p;
     					OutChars++;
     					Flag=0;
     					q=0;
     					break;
     				}
     			}
     		}
     	}
     }
     DChunkSizeArray[ChunkNo]=OutChars;
    

    }

What operating system are you on? Is there a screen attached to the device? How long does the kernel take to execute?
You might just be running into the 5 sec runtime limit.

Its on a Windows 7. I am not sure what you meant screen attached to device. I am using a GTX 280 dedicated graphics card.

5 sec run time limit ? Is there something like that ? I want kernel to operate on a long 8 bit array.

The main problem with the code is that:

			{

				   printf("Not enough indentation and readability\n");

				   abort(-1);

			}

The runtime limit on Windows 7 is actually closer to 2 seconds. To change the delay or turn of the watchdog read [url=“http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx”]http://www.microsoft.com/whdc/device/displ...dm_timeout.mspx[/url]