Crash in kernel function

Hello Im using 9800GT and 9500GT as my gpu and using window. Im currently using Visual Studio 2008 for coding and

my kernel somehow crash with two basic reason. First pointers for variable are pointing to same address even if

that is supposed to be not true. Also seems like variables are not properly transferred to graphic card.

Mostly the calculations are going to 0 or it says its not a number (-1.#QNA…). Following is part of the code for

kernel function. Does anybody find the reason? Thank you.

__global__ void kernel( int a, cuComplex* d, float e, int* f, float* g ){

	int xx = threadIdx.x;

	if(xx < XDIM)


		for(int yy =0; yy<YDIM;yy++)


			int i = int(g[yy*2]);

			d[yy].x = (function of a,e,f,g)/e;

			d[yy].y = 0;





I have calculated the memory needed and it was way lower than 512MB which is the total size of memory for

9800GT, so there should be no problem regarding memory allocation. Some how value of e is going to 0. Im

setting value of variable e as

float e_host = somevalue…

cudaMalloc((void**)&e, sizeof(float));

cudaMemcpy(e,&e_host, sizeof(float), cudaMemcpyHostToDevice);

can there be any errors with those codes? Thank you.

What’s the error message you get? Or does it just crash? Could you post the relevant actual code you’re using on the host side? Also, what’s your

int i = int(g[yy*2]);

meant for? And for the line

d[yy].x = (function of a,e,f,g)/e;

what manner of function is this?

It says

Unhandled exception at 0x013e1795 in TesterForKernel.exe: 0xC0000005: Access violation reading location 0x00110000.

the integer i is the number that locates relevant pre-calculated value stored in another array. I cannot post function but its just a simple

function taking values from array f and g and dividing with value e. Another problem related is that sometimes it seems like value stored in

e becomes 0 and Im not sure why cudaMemcpy cannot copy the right values for it.

Bellow is the main function I hope this helps. Once again thank you for your kindness and have a nice day.

nt main(void)


	float k_resampledspacing=0.109, *dev_k, *k_device;

	bool fft = true; bool resample = true;

	int frame=0, *dev_frame;

	int b[10], *dev_b; 

	float g[10], *dev_g;

	cuComplex *h_in, *d_in;

	cudaMalloc( (void**)&dev_k, sizeof(float));

	cudaMalloc( (void**)&dev_frame, sizeof(int));

	cudaMalloc( (void**)&dev_b, sizeof(int)*10);

	cudaMalloc( (void**)&dev_g, sizeof(float)*10);

	cudaMalloc( (void**)&d_in, sizeof(cuComplex)*YDIM);


	h_in = (cuComplex *) malloc(sizeof(cuComplex)*YDIM);

	cudaMemcpy(dev_k, &k_resampledspacing, sizeof(float), cudaMemcpyHostToDevice);

	cudaMemcpy(dev_frame, &frame, sizeof(int), cudaMemcpyHostToDevice);

	cudaMemcpy(dev_b, b, sizeof(int)*10, cudaMemcpyHostToDevice);

	cudaMemcpy(dev_g, g, sizeof(float)*10, cudaMemcpyHostToDevice);

	k_device = (float *) malloc(sizeof(float));

	cudaMemcpy(k_device, dev_k, sizeof(float), cudaMemcpyDeviceToHost);


	printf("k value in device is = %f\n", *k_device);

	for(int i =0;i<5;i++){

		g[2*i] = 1;

		g[2*i+1] = 3.3;

		b[2*i] = 2;

		b[2*i+1] = 2;


	for(int i=0;i<10;i++){

		printf("dev_resamp[%d] = %f\n",i,g[i]);

		printf("dev_b[%d] = %d\n", i,b[i]);



	for(int yy =0; yy<YDIM;yy++)


		int i = int(g[yy*2]);

		h_in[yy].x = (float)b[frame*YDIM*XDIM+0*YDIM+i]-g[yy*2+1]*((float)b[frame*YDIM*XDIM+0*YDIM+i+1]-(float)b[frame*YDIM*XDIM+0*YDIM+i])/k_resampledspacing;

		h_in[yy].y = 0;


	for(int i=0;i<YDIM;i++){

		printf("h_in[%d].x = %f\n", i,h_in[i].x);


	if(!(fft&&resample)) goto breakpoint;

	doResample<<<1, XDIM>>>(*dev_frame, d_in,*dev_k, dev_b, dev_g);

	cudaMemcpy(h_in, d_in, sizeof(cuComplex)*YDIM, cudaMemcpyDeviceToHost);

	for(int i=0;i<YDIM;i++){

		printf("d_in[%d].x = %f\n", i,h_in[i].x);



	return 0;


dev_k points to device memory, you cannot dereference.

Then do I use a array to store the value instead of integer or float type? How can I change it to work?? Thank you.


Jaehong YOon

You first memcpy your host data to the device, then set the host-array values; these changes will not be visible to the device memory you previously performed the memcpys on. Remember, the address spaces for the host memory (i.e, your RAM) and the device address space (i.e, the global memory for the GPU) are separate (unless you explicitly declare them to be; look into unified virtual addressing if you’re interested in this). Thus, in your current code, if your kernel were to execute, it would be operating with uninitialized dev_b and dev_g variables.

As it stands, your kernel won’t execute, because as LSChien stated, you’re dereferencing a device pointer. The simple answer is to just leave it as a pointer in your kernel invocation and dereference it in your kernel.

Hope that made sense