Correct output with emulation mode, wrong with GPU/Execution

deepti · March 10, 2010, 6:38am

Hello Everyone,

          I have written a kernel which reflects correct result to host with emulation mode, but when i execute on device it returns all zero bits.
          Please suggest, what is going wrong with this case.

Here is my kernel
const int BLOCK_SIZE= 8;

global void CUDA_substitution_kernel(int *d_sub,int *xor1key,int s1[16],int s2[16],int s3[16],int s4[16],int s5[16],int s6[16],int s7[16],int s8[16])
{
int index,i,j,k,lb,ub;

    index=blockIdx.x * blockDim.x + threadIdx.x;

//**/ printf(“index=%d\n”,index);
// shared int Stemp[8][6];
int Stemp[8][6];
lb=((4index)+1)-1;
ub=(4(index+1))-1;
k=0;

//**/ printf(“\nSubstituion Box Temporary:\n”);

// if(index==0)
for(i=0;i<8;i++)
for(j=0;j<6;j++)
{
Stemp[i][j]=xor1key[k++]; //putting 48 bit data into Stemp[8][6]
//**/ printf(“%d\t”,Stemp[i][j]);

            }

    int p,q,multi,l,Boxnum,cross,s;
    int c,a[4],r;   //a[4] to conatain final 4 bit from each SBox and appended to SubChoice[32]
 

    q=0;
    multi=1;
    j=4;
    c=3;

    p=(Stemp[index][0]*2)+(Stemp[index][5]*1);

//**/ printf(“\nindex=%d p=%d\n”,index,p);

     while(j>0) //target to get decimal of 4 bits among six
     {
        k=Stemp[index][j]; //accepts 4 bits one by one reversely
        l=k*multi;    //
        q=q+l;

//**/ printf(“\nFor index=%d k=%d l=%d q=%d multi=%d j=%d\n”,index,k,l,q,multi,j);
multi=multi*2;
j–;

//**/ printf(“\nindex=%d q=%d\n”,index,q);

    //we get final num in q
    Boxnum=index+1;
                    switch(Boxnum)
                    {
                            case 1: cross=s1[p][q]; break;
                            case 2: cross=s2[p][q]; break;
                            case 3: cross=s3[p][q]; break;
                            case 4: cross=s4[p][q]; break;
                            case 5: cross=s5[p][q]; break;
                            case 6: cross=s6[p][q]; break;
                            case 7: cross=s7[p][q]; break;
                            case 8: cross=s8[p][q]; break;
                     }

//**/ printf(“\n index=%d and cross=%d \n”,index,cross);
while(cross>0)
{
r=cross%2;
a[c–]=r;
cross=cross/2;
}
__syncthreads();
while(c>=0)
{
a[c–]=0;
}

       for(l=lb,s=0;l<=ub;l++)
       {
              d_sub[l]=a[s];

//**/ printf(“\n index= %d , d_sub[%d]=%d and s= %d \n”,index,l,d_sub[l],s);
s++;
}

            __syncthreads();

/* printf(“\n\n\nFinal:d_sub\n”);
for(i=0;i<32;i++)
{
printf(“d_sub[%d]=%d\t”,i,d_sub[i]);
}
*/

    return;

}

And this how i call my kernel from main

int xor1key[48]=
{ 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0 };

    dim3 dimblock(BLOCK_SIZE);
    dim3 dimgrid(8/BLOCK_SIZE);

    cudaMallocHost((void **)&SubChoice,32 * sizeof(int));
    memset(SubChoice,0,32 * sizeof(int));

    int* d_sub;
    cudaMalloc((void **)&d_sub,32 * sizeof(int));

    cudaMemcpy(d_sub,SubChoice,32 * sizeof(int),cudaMemcpyHostToDevice);

            printf("\ncuda work\n");
            CUDA_substitution_kernel<<<dimgrid,dimblock>>>(d_sub,xor1key,s1,s2,s3,s4,s5,s6,s7,s8);
            CUT_CHECK_ERROR("kernel error");
            cudaMemcpy(SubChoice,d_sub,32 * sizeof(int),cudaMemcpyDeviceToHost);
            printf("\ncuda done\n");

    printf("\nSubstitution-choice-32-final\n");
    for(i=0;i<32;i++)
    {
            printf("%d\t",SubChoice[i]);
    }
    printf("\n\n");

    cudaFree(d_sub);

This is just the part of encryption algorithm, all the arrays from s1 to s8 are two dimensional arrayof 4*16.

with emulation code i get correct bits as
Substitution-choice-32-final
0 0 0 1 1 0 1 1 1 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 1 0 1 1 1 0 0

but on device i get all zeros.

Thanks in advance.

Regards,
Deepti

Cliff_Woolley · March 10, 2010, 8:01am

Have you tried running this through CUDA-GDB?

–Cliff

deepti · March 11, 2010, 12:15pm

Thanks Cliff.

My next question is the basic one, as first time I am going through this.

After inserting break points it shows this warning.

(cuda-gdb) l 99

94

95 int* d_sub;

96 cudaMalloc((void **)&d_sub,32 * sizeof(int));

97

98 cudaMemcpy(d_sub,SubChoice,32 * sizeof(int),cudaMemcpyHostToDevice);

99

100 printf(“\ncuda work\n”);

101 CUDA_substitution_kernel<<<dimgrid,dimblock>>>(d_sub,xor1key,s1,s2,s3,s4,s5,s6,s7,s8);

102 CUT_CHECK_ERROR(“kernel error”);

103 cudaMemcpy(SubChoice,d_sub,32 * sizeof(int),cudaMemcpyDeviceToHost);

(cuda-gdb) b 101

(cuda-gdb) r

Starting program: /data4/deepti/NVIDIA_GPU_Computing_SDK/C/bin/linux/debug/demo

[Thread debugging using libthread_db enabled]

[New process 32607]

[New Thread 47844348397056 (LWP 32607)]

Warning: a GPU was made unavailable to the application due to debugging

constraints. This may change the application behaviour!

Using device 0: Tesla C1060

cuda work

[Switching to Thread 47844348397056 (LWP 32607)]

Breakpoint 1, fnDemoEncrypt () at demo.cu:101

101 CUDA_substitution_kernel<<<dimgrid,dimblock>>>(d_sub,xor1key,s1,s2,s3,s4,s5,s6,s7,s8);

Current language: auto; currently c++

(cuda-gdb) continue

Continuing.

Cuda error: kernel error in file ‘demo.cu’ in line 102 : unspecified launch failure.

Program exited with code 01.[/b]

Many people get the same warning, but i didn’t find the solution , other than this line on nvidia forum itself - “The next version of the CUDA tools, whose beta version will be available next month, fixes the issue.”

Here are vesion for cuda-gdb and nvcc.

cuda-gdb --version

NVIDIA ® CUDA Debugger

BETA release

GNU gdb 6.6

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type “show copying” to see the conditions.

There is absolutely no warranty for GDB. Type “show warranty” for details.

This GDB was configured as “x86_64-unknown-linux-gnu”.

nvcc --version

nvcc: NVIDIA ® Cuda compiler driver

Built on Thu_Jul_30_09:24:36_PDT_2009

Cuda compilation tools, release 2.3, V0.2.1221

Please suggest me, whether it is the version issue or I am going wrong because at line 102 launch failure is coming.

avidday · March 11, 2010, 12:35pm

This has nothing to do with cuda-gdb. I would guess your kernel is never launching because invalid arguments or excessive argument length. But it is impossible to say because the code you posted is incomplete.

deepti · March 12, 2010, 9:37am

I reduce the argument length by defining all the arrays from s1 to s8 within the kernel. But I got the same error - launch failure. External Media

Actually after its successful run, I will merge this code with my actual program. I don’t want to keep array definition in kernel, because I want to call same kernel multiple times also.This time I am attaching my code.

Thanks in advance.

My Makefile contents:

EXECUTABLE := demo

CUFILES := demo.cu

include …/…/common/common.mk

demo.cu (3.11 KB)

demo_kernel.cu (2.68 KB)

Deepti

Cliff_Woolley · March 12, 2010, 7:56pm

Hmm… I’m not seeing a kernel launch failure. Also one thing I notice right off is that it looks like your arrays are declared in CPU mem rather than in GPU mem (as in with the device qualifier), so the pointer to the arrays are not valid from the GPU’s point of view.

deepti · March 25, 2010, 8:00am

Thanks Everyone,

      The problem is solved. I was passing an array 'xor1key' without allocating memory on device. Since this array is required just for initializing, so didnt gave the memory and thats why kernel is not launching.

Deepti

Topic		Replies	Views
CUDA-GDB captured "Illegal access to address" exception when I invoke child kernel, but the result is correct when free run CUDA Programming and Performance	6	1702	March 20, 2017
Cuda-gdb crash when trying to debug kernel launched through `cudaLaunchCooperativeKernel` CUDA-GDB cuda-gdb	11	2351	April 29, 2024
Cuda-gdb aborted CUDA-GDB	7	119	November 24, 2024
Always got this warning when nvprof cuda file "This can happen if device ran out of memory or if a device kernel was stopped due to an assertion" on just HellowWorld GPU CUDA Programming and Performance	9	2552	January 31, 2019
CUDA compile trouble CUDA Programming and Performance	47	5104	November 8, 2010
Problems with CUDA CUDA Programming and Performance	8	2825	December 3, 2012
Cuda-gdb internal-error of copy_type on basic fortran example CUDA-GDB cuda-gdb	5	963	January 2, 2024
Cuda-gdb script execution CUDA-GDB	6	1866	January 29, 2024
Unable to compile with nvcc, gcc-4.3.3 support for exception handling is disabled CUDA Programming and Performance	14	13845	June 9, 2009
Silent kernel failure CUDA Programming and Performance	25	8237	May 18, 2020

Correct output with emulation mode, wrong with GPU/Execution

Related topics