Error in calculation with unsigned chars


I have a problem with calculations on unsigned char arguments. I was able to write a small example to reproduce it.

The code below allocates 3 values of INPUT_TYPE (unsigned chars of unsigned ints, depending on the define), copies them to the device, calculates the result on the device, reads back the result, and executes the same function on the host. When abc[2] != 0, and INPUT_TYPE is unsigned char, the results are different, and the result I get from the device is wrong.

If I compile the code with


the results are correct.

if I compile the code with INPUT_TYPE = unsigned int, the result is also correct.

I added the readback of the casted arguments abc, and they are always correct (as seen in the output).

When looking at the generated ptx files in both cases (chars and ints), the only difference there is, is when loading the arguments, the former uses, while the latter uses (which is correct, since I cast the values to int).


#include <stdio.h>

#include <cutil.h>

#define INPUT_TYPE unsigned char

//#define INPUT_TYPE unsigned int

__device__ __host__ int calc(int a, int b, int c) {

    return 28784 * a - 24103 * b -  4681 * c;


__global__ void kernel(const void* iData, void* oData) {

   const INPUT_TYPE *iPtr = ( const INPUT_TYPE * )iData;

    int *oPtr = (int*)oData;

   int a = (int)(unsigned)iPtr[0];

    int b = (int)(unsigned)iPtr[1];

    int c = (int)(unsigned)iPtr[2];

   int res = calc(a, b, c);

    oPtr[0] = res;


    // testing cast

    oPtr[1] = a;

    oPtr[2] = b;

    oPtr[3] = c;


int main() {

   INPUT_TYPE abc[3] = {0, 0, 1};


    void* iDataGPU;

    CUDA_SAFE_CALL(cudaMalloc(&iDataGPU, 3*sizeof(INPUT_TYPE)));

    CUDA_SAFE_CALL(cudaMemcpy(iDataGPU, abc, 3*sizeof(INPUT_TYPE), cudaMemcpyHostToDevice));

   void* oDataGPU;

    CUDA_SAFE_CALL(cudaMalloc(&oDataGPU, 4*sizeof(int)));

   kernel<<< dim3(1), dim3(1)>>>(iDataGPU, oDataGPU);


   int res;

    CUDA_SAFE_CALL(cudaMemcpy(&res, oDataGPU, sizeof(int), cudaMemcpyDeviceToHost));

   int abcout[3];

    CUDA_SAFE_CALL(cudaMemcpy(abcout, (int*)oDataGPU + 1, 3*sizeof(int), cudaMemcpyDeviceToHost));



   int swRes = calc(abc[0], abc[1], abc[2]);

   printf("abc=[%d,%d,%d]; abcout=[%d,%d,%d], res=%d, swRes=%d\n", abc[0], abc[1], abc[2], abcout[0], abcout[1], abcout[2], res, swRes);

   return 0;



abc=[0,0,1]; abcout=[0,0,1], res=306769335, swRes=-4681


> nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2006 NVIDIA Corporation

Built on Wed_Jun_20_18:50:03_PDT_2007

Cuda compilation tools, release 1.0, V0.2.1221
# CUDA SDK 10 Linux Version 1.00.0625.0025
> gcc -v

Using built-in specs.

Target: i686-pc-linux-gnu

Configured with: /var/tmp/portage/sys-devel/gcc-4.1.2/work/gcc-4.1.2/configure --prefix=/usr --bindir=/usr/i686-pc-linux-gnu/gcc-bin/4.1.2 --includedir=/usr/lib/gcc/i686-pc-linux-gnu/4.1.2/include --datadir=/usr/share/gcc-data/i686-pc-linux-gnu/4.1.2 --mandir=/usr/share/gcc-data/i686-pc-linux-gnu/4.1.2/man --infodir=/usr/share/gcc-data/i686-pc-linux-gnu/4.1.2/info --with-gxx-include-dir=/usr/lib/gcc/i686-pc-linux-gnu/4.1.2/include/g++-v4 --host=i686-pc-linux-gnu --build=i686-pc-linux-gnu --disable-altivec --enable-nls --without-included-gettext --with-system-zlib --disable-checking --disable-werror --enable-secureplt --disable-libunwind-exceptions --disable-multilib --enable-libmudflap --disable-libssp --disable-libgcj --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu

Thread model: posix

gcc version 4.1.2 (Gentoo 4.1.2)

Installed video card: G80 (GeForce 8800 GTX rev a2)

Installed memory: 2GB

Processor type: Intel Core2 6300 @ 1.86GHz

Chipset: nVidia MCP55

Thanks for reporting this.

It does appear to be a genuine bug, and the compiler team is working on a fix. The fix should make it into the next release.

Other users take note - this is a great example of a clear and easy to reproduce bug report!