Inconsistancy between NVCC and MS-Compiler

Adel_Ahmed · December 9, 2010, 8:32am

I wrote a simple program that allocates two arrays and initializes them in a loop. Then compiled them with NVCC and MS-VS compiler separately. The code compiled with NVCC is giving unexpected results. The code is pure C code. No CUDA related statements whatsoever.

Details:

Using NVCC

[list=1]

[*]Created an empty project in VS2008 and configured it for CUDA compiling.

[*]Created a .CU file and put the following code in it:

#include <stdio.h>

#include <cuda.h>

int main(void)

{

   int *a_h, *b_h;

   const int N = 10;

   size_t size = N+1 * sizeof(int);

   size_t i;

   // allocate the host arrays

   a_h = (int *) malloc(size);

   b_h = (int *) malloc(size);

   // populate host arrays

   for (i = 0; i < size; i++)

   {

   	printf("i=%2d: ");

   	a_h[i] = i;

   	printf("a_h[%d] = %2d, since i is still %2d, so ", i, a_h[i], i, i);

   	b_h[i] = i;

   	printf("b_h[%d] = %2d\n", i, b_h[i], i);

   }

   // DEBUG DATA

   printf("=====A======\n");

   for (i = 0; i < size; i++) printf("a_h[%d] = %2d\n", i, a_h[i]);

   printf("\n");

   printf("=====B======\n");

   for (i = 0; i < size; i++) printf("b_h[%d] = %2d\n", i, b_h[i]);

   printf("\n+++++++++++++++++++++++\n");

}

[*]The output was:

i= 0: a_h[0] =  0 and i is still  0, so b_h[0] =  0

i= 0: a_h[1] =  1 and i is still  1, so b_h[1] =  1

i= 0: a_h[2] =  2 and i is still  2, so b_h[2] =  2

i= 0: a_h[3] =  3 and i is still  3, so b_h[3] =  3

i= 0: a_h[4] =  4 and i is still  4, so b_h[4] =  4

i= 0: a_h[5] =  5 and i is still  5, so b_h[5] =  5

i= 0: a_h[6] =  6 and i is still  6, so b_h[6] =  6

i= 0: a_h[7] =  7 and i is still  7, so b_h[7] =  7

i= 0: a_h[8] =  8 and i is still  8, so b_h[8] =  8

i= 0: a_h[9] =  9 and i is still  9, so b_h[9] =  9

i= 0: a_h[10] = 10 and i is still 10, so b_h[10] = 10

i= 0: a_h[11] = 11 and i is still 11, so b_h[11] = 11

i= 0: a_h[12] = 12 and i is still 12, so b_h[12] = 12

i= 0: a_h[13] = 13 and i is still 13, so b_h[13] = 13

=====A======

a_h[0] =  0

a_h[1] =  1

a_h[2] =  2

a_h[3] =  3

a_h[4] =  4

a_h[5] =  5

a_h[6] =  6

a_h[7] =  7

a_h[8] =  8

a_h[9] =  9

a_h[10] = 10

a_h[11] = 11

a_h[12] = 12

a_h[13] = 13

=====B======

b_h[0] =  6

b_h[1] =  7

b_h[2] =  8

b_h[3] =  9

b_h[4] = 10

b_h[5] = 11

b_h[6] = 12

b_h[7] = 13

b_h[8] =  8

b_h[9] =  9

b_h[10] = 10

b_h[11] = 11

b_h[12] = 12

b_h[13] = 13

+++++++++++++++++++++++

You can see the unexpected value of b_h[0], b_h[1] till b_h[7]. They should be 0, 1, … 7, respectively but they are not.

Using MS-VS2008 Compiler

[list=1]

[*]I then created another project in VS2008 with a CPP program

[*]The .CPP file contained the exact same code

#include "stdafx.h"

#include <stdio.h>

#include <stdlib.h>

int _tmain(int argc, _TCHAR* argv[])

{

   int *a_h, *b_h;

   const int N = 10;

   size_t size = N+1 * sizeof(int);

   size_t i;

   // allocate the host arrays

   a_h = (int *) malloc(size);

   b_h = (int *) malloc(size);

   // populate host arrays

   for (i = 0; i < size; i++)

   {

   	printf("i=%2d: ");

   	a_h[i] = i;

   	printf("a_h[%d] = %2d, since i is still %2d, so ", i, a_h[i], i, i);

   	b_h[i] = i;

   	printf("b_h[%d] = %2d\n", i, b_h[i], i);

   }

   // DEBUG DATA

   printf("=====A======\n");

   for (i = 0; i < size; i++) printf("a_h[%d] = %2d\n", i, a_h[i]);

   printf("\n");

   printf("=====B======\n");

   for (i = 0; i < size; i++) printf("b_h[%d] = %2d\n", i, b_h[i]);

   printf("\n+++++++++++++++++++++++\n");

}

[*]The only difference in syntax is:

[*]in .CU file I include <cuda.h>

[*]in .CPP file I include <stdlib.h>

[*]The output was

i= 0: a_h[0] =  0, since i is still  0, so b_h[0] =  0

i= 0: a_h[1] =  1, since i is still  1, so b_h[1] =  1

i= 0: a_h[2] =  2, since i is still  2, so b_h[2] =  2

i= 0: a_h[3] =  3, since i is still  3, so b_h[3] =  3

i= 0: a_h[4] =  4, since i is still  4, so b_h[4] =  4

i= 0: a_h[5] =  5, since i is still  5, so b_h[5] =  5

i= 0: a_h[6] =  6, since i is still  6, so b_h[6] =  6

i= 0: a_h[7] =  7, since i is still  7, so b_h[7] =  7

i= 0: a_h[8] =  8, since i is still  8, so b_h[8] =  8

i= 0: a_h[9] =  9, since i is still  9, so b_h[9] =  9

i= 0: a_h[10] = 10, since i is still 10, so b_h[10] = 10

i= 0: a_h[11] = 11, since i is still 11, so b_h[11] = 11

i= 0: a_h[12] = 12, since i is still 12, so b_h[12] = 12

i= 0: a_h[13] = 13, since i is still 13, so b_h[13] = 13

=====A======

a_h[0] =  0

a_h[1] =  1

a_h[2] =  2

a_h[3] =  3

a_h[4] =  4

a_h[5] =  5

a_h[6] =  6

a_h[7] =  7

a_h[8] =  8

a_h[9] =  9

a_h[10] = 10

a_h[11] = 11

a_h[12] = 12

a_h[13] = 13

=====B======

b_h[0] =  0

b_h[1] =  1

b_h[2] =  2

b_h[3] =  3

b_h[4] =  4

b_h[5] =  5

b_h[6] =  6

b_h[7] =  7

b_h[8] =  8

b_h[9] =  9

b_h[10] = 10

b_h[11] = 11

b_h[12] = 12

b_h[13] = 13

+++++++++++++++++++++++

[*]Here the values of b_h[0], b_h[1] till b_[7] are as expected.

Can somebody please tell me why am I getting erroneous behavior from NVCC?

My System Configuration:

[list=1]

[*]CPU: Intel Core2 CPU 6600 @ 2.0GHz

[*]RAM: 4.00GB

[*]OS: Windows 7 Professional 64Bit

[*]Display: NVIDIA Quadro FX5800

[*]Display Bios: Version 62.0.3a.0.3

[*]NVIDIA Driver Version: 8.17.12.5981

[*]NVIDIA CUDA SDK: 3.2

[*]nvcc --version:

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2010 NVIDIA Corporation

Built on Thu_Nov__4_13:45:48_PDT_2010

Cuda compilation tools, release 3.2, V0.2.1221

Attached is the build log for the NVCC compilation.
BuildLog.htm (12.8 KB)

avidday · December 9, 2010, 8:42am

The first point to make is the nvcc isn’t a compiler. It is a compiler driver - it just preprocesses code to split off device code from host code. In both your examples the code will be compiled using VS2008. Any device code found is passed to nvopencc for compilation. But there is none in this case so it is irrelevant.

What you are seeing has nothing to do with the compiler and everything to do with out of bounds memory access in your code. If you fix your code, the compiler problem you think you have will magically disappear.

Adel_Ahmed · December 9, 2010, 9:07am

I agree, NVCC is not a compiler, but a compiler driver. So how come when the same code is compiled using NVCC generates code that produces different results from when the code is compiled not using NVCC?

My background is Java and I am not very familiar with C. Can you please point out the problem with my code and where the “out of bounds memory access” is happening in this simple code?

-Adel

avidday · December 9, 2010, 9:23am

It looks like your intention is to assign and loop through arrays of length (N+1)=11. So why should the code only allocate size = N + sizeof(int) = 10 + 4 = 14 bytes, and then perform 14 trips through the assignment and debug output loops?

Edited for the second bug I missed the first time I read the code.

cbuchner1 · December 9, 2010, 11:05am

N+1*sizeof(int) != (N+1)*sizeof(int)

Mind the operator precedence.

Adel_Ahmed · December 10, 2010, 1:19pm

Thank you all for your replies. It turned out to be a pure C issue. Nothing to do with CUDA and its SDK.

Just for the record here are my remarks. Please keep in mine that I have a Java background and my coding style assumes that many things are being taken care of by the compiler and the run-time system, which is in fact not the case when working with C.

Knowing that an array of integers can be allocated using:

int *a_h, *b_h;

int N = 10;

int i;

a_h = (int *) malloc(N * sizeof(int)); // allocate space for N integers

b_h = (int *) malloc(N * sizeof(int)); // allocate space for another N integers

Since sizeof(int) = 4, each array consists of 40 bytes. What I realized that these arrays should be iterated using N and not using N*sizeof(int). I used to declare a “size” variable:

int size = N * sizeof(int);

then iterate the array using this “size” variable, like:

for (i = 0; i < size; i++)

   a_h[i] = i;

This would iterate beyond the boundary of the array. The correct way to do it is:

for (i = 0; i < N; i++)

   a_h[i] = i;

something I found out the hardway.

Thanks avidday for pointing out the “boundary problem”, and thanks cbuchner1 for pointing out the operator precedence (that was a silly mistake on my behalf).

The one thing I still don’t understand is why was I getting two different behavior when the same code is compiled in two different VS2008 projects. I will not invest my time to find out the answer as it is not the core of my concern at the moment.

Once again “thank guys”,

-Adel

Topic		Replies	Views
beginner with boundary problems CUDA Programming and Performance	4	4461	October 28, 2009
NVCC: Variable missing when compiling with nvcc? Renaming from cu to cpp and compiling with VS works CUDA Programming and Performance	1	873	November 23, 2009
about nvcc in command line CUDA Programming and Performance	0	2303	December 25, 2009
nvcc and visual studio compiler CUDA Programming and Performance	2	3092	March 13, 2010
CUDA compilation fails nvcc compiler CUDA Programming and Performance	14	2714	August 21, 2010
Same code works in VS, but not from command prompt? CUDA Programming and Performance	4	1059	May 4, 2013
NVCC bug report: a runtime error CUDA Programming and Performance	7	6545	March 19, 2009
Difference in Performance CUDA Programming and Performance	13	9889	August 20, 2008
What's wrong with this? (malign-double?, passing integer array?, bug?) CUDA Programming and Performance	4	5415	August 28, 2010
NVCC Segfault on boost::format in Host side code in .cu file CUDA Programming and Performance	8	2358	February 8, 2011

Inconsistancy between NVCC and MS-Compiler

Related topics