strange behavior with device emulation

twister1 · May 15, 2008, 10:50am

Hi there CUDA people!

I’ve got a problem with the following test code:

static int * d_data = NULL;

...

void cuda_test() {

	// allocate storage for 3 integers

	int size = 3*sizeof(int);

	if (!d_data) {

  CUT_SAFE_MALLOC(cudaMalloc((void**) &d_data, size));

  CUT_CHECK_ERROR("alloc error");

  cudaThreadSynchronize();

	}

	

	// fill host array and transfer it to the device

	int * dataBefore = new int[3];

	for (int i = 0; i < 3; ++i) {

  dataBefore[i] = i;

  printf("dataBefore[%d] = %d\n", i, dataBefore[i]);

	}

	cudaMemcpy(d_data, dataBefore, size, cudaMemcpyHostToDevice);

	CUT_CHECK_ERROR("copy error");

	cudaThreadSynchronize();

	delete [] dataBefore;

	dataBefore = NULL;

	

	// prepare readback array and initialize with error values (-1234)

	int * dataAfter = new int[3];

	for (int i = 0; i < 3; ++i) {

  dataAfter[i] = -1234;

	}

	

	// transfer data back (data should not be changed by the device)

	cudaMemcpy(dataAfter, d_data, size, cudaMemcpyDeviceToHost);

	CUT_CHECK_ERROR("readback error");

	cudaThreadSynchronize();

	for (int i = 0; i < 3; ++i) {

  printf("dataAfter[%d] = %d\n", i, dataAfter[i]);

	}

	delete [] dataAfter;

	dataAfter = NULL;

	

	if (d_data) {

  cudaFree(d_data);

  CUT_CHECK_ERROR("dealloc error");

  cudaThreadSynchronize();

	}

}

If I turn device emulation off I get the expected output:

dataBefore[0] = 0

dataBefore[1] = 1

dataBefore[2] = 2

dataAfter[0] = 0

dataAfter[1] = 1

dataAfter[2] = 2

However, when device emulation is on, the second cudaMemcpy command seems to have no effect:

dataBefore[0] = 0

dataBefore[1] = 1

dataBefore[2] = 2

dataAfter[0] = -1234

dataAfter[1] = -1234

dataAfter[2] = -1234

Can someone confirm this behavior or tell me that I’m doing wrong?

Thanks in advance!

MisterAnderson42 · May 15, 2008, 1:01pm

You have CUT_CHECK_ERRORs in there, which is good. But have you compiled in debug mode so that the error checking is enabled. At a glance, I don’t see any problems with your code: I can only guess that there is a CUDA initialization error or something.

redpill · May 15, 2008, 2:52pm

Hi there CUDA people!

I’ve got a problem with the following test code:

(snipped)

If I turn device emulation off I get the expected output:
dataBefore[0] = 0

dataBefore[1] = 1

dataBefore[2] = 2

dataAfter[0] = 0

dataAfter[1] = 1

dataAfter[2] = 2
However, when device emulation is on, the second cudaMemcpy command seems to have no effect:
dataBefore[0] = 0

dataBefore[1] = 1

dataBefore[2] = 2

dataAfter[0] = -1234

dataAfter[1] = -1234

dataAfter[2] = -1234
Can someone confirm this behavior or tell me that I’m doing wrong?

Thanks in advance!

[snapback]377424[/snapback]

Well, you’re not actually calling a kernel, so perhaps you’ve hit an obscure compiler bug that optimizes away the second cudaMemcpy. What version of CUDA are you using, on what platform, and with which card?

Try calling a kernel that does as little as possible: e.g. read the first byte from global memory, then write it back to the same location. (If you do any less, it might be optimized away itself.)

See if that works as expected under device emulation.

twister1 · May 16, 2008, 7:07am

Hmm, that could really be a problem since compiler optimizations are turned on. I’ve now inserted a simple kernel…

__global__ void incrementKernel(int * data) {

     data[threadIdx.x] += 1;

}

…and it’s corresponding launch command:

dim3 dimGrid(1,1);

dim3 dimBlock(3,1);

incrementKernel<<<dimGrid,dimBlock>>>(d_data);

cudaThreadSynchronize();

The dataAfter array still remains unchanged in device emulation mode while all values are incremented without emulation (as expected). I’m using CUDA 1.1 on debian linux 3.1 (32-bit) with a GeForce 8800 GTX.

MisterAnderson42 · May 16, 2008, 1:11pm

The compiler will never optimize away a cudaMemcpy.

You never said if you were building in debug mode or not. You could just call cudaMemcpy like so to check for errors:

cudaError_t error;

error = cudaMemcpy(...);

if (error != cudaSuccess)

    printf("Error: %s\n", cudaGetErrorString(error));

twister1 · May 20, 2008, 1:40pm

The problem occurs in debug as well as release mode…

Indeed, with MisterAnderson42’s helpful error check snippet I get an “invalid argument” error, but I can’t figure out what it means here. Moreover, also a simple cudaThreadSynchronize() produces an “invalid argument” error (where is that invalid argument??). :blink:
Additionally, I wonder why CUT_CHECK_ERROR(“…”) keep quiet (also in debug mode).

I then installed the error check snippet everywhere in my code and found out that all CUDA API calls in the constructor of my GPGPU class are successful and every future API call fails. Furthermore, if I disable all API calls in the constructor, all future API calls work well…

At least I’m now able to do a work around (postponement of the initializing CUDA API calls).

Thanks very much for your help!!! External Image

Topic		Replies	Views
CUDA Emulator corrupting 3D array CUDA Programming and Performance	6	6983	July 22, 2010
cudaDeviceSynchronize needed between kernel launch and cudaMemcpy ? CUDA Programming and Performance	15	16307	September 29, 2017
cudaError at memory locat.... HELP CUDA Programming and Performance	7	22633	December 9, 2009
cudaFree is returning an unrecognised error code CUDA Programming and Performance	10	7947	March 13, 2009
Weird error CUDA Programming and Performance	5	2613	August 27, 2007
Code works under emulation, but fails on the device CUDA Programming and Performance	3	2176	July 30, 2009
Random, occasional "unknown error" after kernel CUDA Programming and Performance	5	23093	July 30, 2011
cudaSynchronizeDevice() returns error code 6 CUDA Programming and Performance	7	8601	June 16, 2011
illegal memory access - any help appreciated CUDA Programming and Performance	5	6782	February 8, 2018
Embarassingly beginner question CUDA Programming and Performance	8	3289	May 22, 2009

strange behavior with device emulation

Related topics