Mixing Types

MasterKitten · August 11, 2011, 3:25pm

Hello Im using 9800GT and 9500GT on Windows. Im just curious why mixing types when calling kernel actually crushes
the function. I mean if I call

kernel(int, int, float,…) like this
somehow on my computer the values calculated become useless eventhough they are deterministic. Can somebody help me with this?
Thank you.

alrikai · August 11, 2011, 5:24pm

I’m not sure I follow your question; you mean that whenever you call a kernel with more than 1 data type in the parameter list, it’ll crash? Are your parameters in device memory?

MasterKitten · August 12, 2011, 1:06am

Yes. My kernel crashes whenever there is more than 1 data type. I also set my parameters to be in device memory but still it is not working.

It gives a deterministic values which are wrong. Does anybody recognize the problem? Thank you.

Regards,

Jaehong Yoon

tera · August 12, 2011, 8:09am

The problem most likely is elsewhere. Please post a complete, self-contained example so that we can help with it.

MasterKitten · August 12, 2011, 4:14pm

__global__ void doResample( int* a, cuComplex* d, float* e, int* f, float* g ){

		

/* 

   a = dev_frame   b = dev_fft   c = dev_resample

   d = d_in        e = k_resampledspacing 

   f = dev_b       g = dev_resamp 

*/

	int xx = threadIdx.x;

	

//General stucture of kernel for parallel computing

//Do computing while xx and yy are smaller than XDIM and YDIM and boolean

//dev_resample and dev_fft are both true.

	if(xx < XDIM)

	{

		for(int yy =0; yy<YDIM;yy++)

		{

			int i = int(g[yy*2]);

			d[yy].x = f[a[0]*YDIM*XDIM+xx*YDIM+i]-g[yy*2+1]*(float)(f[a[0]*YDIM*XDIM+xx*YDIM+i+1]

			-f[a[0]*YDIM*XDIM+xx*YDIM+i])/(float)e[0];

			d[yy].y = 0;

		}			

	}

	

}

When I change a, and f into float type it works perfect. Does anybody know what this happens? Also Im not sure why I have to cast

all variable that are float type in first place in order to make the code work right. Thank you

tera · August 12, 2011, 5:50pm

When you change a and f to float, do you also change the calling code? Since a and f are pointers, obviously the kernel and it’s caller need to agree what they point to.

MasterKitten · August 12, 2011, 6:33pm

Yes I do cast (int) in front of a whenever I call it. I also cast (float) in front of f which does not make sense to me but somehow makes the code work as it should be.

tera · August 12, 2011, 6:50pm

Casting a pointer to a different type will not change the type of the data it points to (i.e., the binary data remains unchanged), it will only change how the code interprets this binary data. Thus casting a float* to an int* will not work.

How do you call the kernel?

MasterKitten · August 15, 2011, 3:09pm

__global__ void doResample( float* a, cuComplex* d, float* f){ 

	int tid = threadIdx.x;

	if(tid < XDIM)

	{

		for(int yy =0; yy<YDIM;yy++)

		{

			d[yy].x = f[(int)a[0]*YDIM*XDIM+tid*YDIM+yy];

			d[yy].y = 0;

		}			

	}	

}

I actually changed the code as above and call kernel as

doResample<<<1, XDIM>>>(values…)

now Im getting bunch of 0s. Size of data Im transfering is 1024100020 (for f) and size of d is 1024 in float2. Can size of

such array cause error? Thank you

MasterKitten · August 15, 2011, 6:46pm

When I do CPU debugging it says

irst-chance exception at 0x75999673 in Resampling2.exe: Microsoft C++ exception: [rethrow] at memory location 0x00000000…
The program ‘[2604] Resampling2.exe: Native’ has exited with code 0 (0x0).

and when I try to use Parallel Nsight it says

The thread ‘CUDA Default Context’ (0x0) has exited with code 0 (0x0).
The thread ‘’ (0xcb6970) has exited with code 0 (0x0).
The program ‘[3276] Resampling2.exe: CUDA’ has exited with code 0 (0x0).

Does this mean anything? Thank you

alrikai · August 15, 2011, 7:48pm

Dunno if I’m telling you stuff you already know, but

memory location 0x00000000 → null pointer, and the Parallel Nsight messages are just telling you your program exited (and returned 0)

tera · August 16, 2011, 12:34pm

The part I marked in bold was what my question was about. I strongly suspect that the types of your arguments do not match the types of your kernel, and you are just hiding this with an incompatible type cast.

MasterKitten · August 16, 2011, 4:36pm

Bellow is the main code

int main(int argc, char* argv[]) {

	int frame = 0; float f_frame;

	f_frame = (float)frame;

	readData( );

	dc_subtract();

	h_in = (cuComplex *)malloc(sizeof(cuComplex)*YDIM);

	HANDLE_ERROR( cudaMalloc((void**)&dev_frame, sizeof(float)));

	HANDLE_ERROR( cudaMalloc((void**)&dev_b, sizeof(float)*FRAMES*XDIM*YDIM));

	HANDLE_ERROR( cudaMalloc((void**)&d_in,sizeof(cuComplex)*YDIM));

		

	HANDLE_ERROR(cudaMemcpy(dev_b, b_1D, sizeof(float)*FRAMES*XDIM*YDIM, cudaMemcpyHostToDevice));

	HANDLE_ERROR(cudaMemcpy(dev_frame, &frame, sizeof(float), cudaMemcpyHostToDevice));

	cudaThreadSynchronize();

	doResample<<<1, XDIM>>>(dev_frame, d_in, dev_b);

	

	HANDLE_ERROR(cudaMemcpy(h_in, d_in, sizeof(cuComplex)*YDIM, cudaMemcpyDeviceToHost));

	cudaThreadSynchronize();

	for(int i =0; i<10;i++){

		printf("%f\n",h_in[i].x);

	} 

	cudaFree(dev_frame);

	cudaFree(dev_fft);

	cudaFree(dev_k_resampledspacing);

	cudaFree(dev_resample);

	cudaFree(dev_b);

	cudaFree(dev_resamp);

	cudaFree(d_in);

	free(h_in);

}

Im not sure if the types are wrong… And about the null point what does that mean does that mean the program is just shutting down without

doing anything? Thank you

tera · August 17, 2011, 2:24am

You managed to again leave out the declarations of many of the relevant variables. However, from the parts you posted I can already see that you are mixing different pointer types: [font=“Courier New”]frame[/font] is of type integer, copied into [font=“Courier New”]dev_frame[/font] of unknown type but size [font=“Courier New”]sizeof(float)[/font] then used as a float kernel parameter, whose content is later cast back to integer. None of these casts is necessary. Copying an int into a float* is just wrong and cannot be undone by casting the resulting float back to int.

MasterKitten · August 17, 2011, 4:04pm

__global__ void doResample(int* a, cuComplex* d, float* f){ 

/* 

   a = dev_frame   b = dev_fft   c = dev_resample

   d = d_in        e = k_resampledspacing 

   f = dev_b       g = dev_resamp 

*/

	int tid = threadIdx.x;

	int b = *a;

	if(tid < XDIM)

	{

		for(int y =0; y<YDIM;y++)

		{

			d[y].x = f[b*YDIM*XDIM+tid*YDIM+y];

			d[y].y = 0;

		}			

	}	

}

int main(int argc, char* argv[]) {

	int frame = 0;

	readData( );

	dc_subtract();

	float *cpu_b;

	cpu_b = (float *)malloc(sizeof(float)*FRAMES*XDIM*YDIM);

	h_in = (cuComplex *)malloc(sizeof(cuComplex)*YDIM);

	HANDLE_ERROR( cudaMalloc((void**)&dev_frame, sizeof(int)));

	HANDLE_ERROR( cudaMalloc((void**)&dev_b, sizeof(float)*FRAMES*XDIM*YDIM));

	HANDLE_ERROR( cudaMalloc((void**)&d_in,sizeof(cuComplex)*YDIM));

		

//	Allocation of Host memory (if goes to main becomes faster than this version)

//	still having black screen which have no idea

	HANDLE_ERROR(cudaMemcpy(dev_b, b_1D, sizeof(float)*FRAMES*XDIM*YDIM, cudaMemcpyHostToDevice));

	HANDLE_ERROR(cudaMemcpy(dev_frame, &frame, sizeof(int), cudaMemcpyHostToDevice));

	cudaMemcpy(cpu_b, dev_b, sizeof(float)*FRAMES*XDIM*YDIM, cudaMemcpyDeviceToHost);

	for(int i=0;i<YDIM;i++){

		printf("%f\n",cpu_b[19*XDIM*YDIM+999*YDIM+i]);

	}

	cudaThreadSynchronize();

	doResample<<<1, XDIM>>>(dev_frame, d_in, dev_b);

	cudaThreadSynchronize();

	HANDLE_ERROR(cudaMemcpy(h_in, d_in, sizeof(cuComplex)*YDIM, cudaMemcpyDeviceToHost));

	cudaThreadSynchronize();

	for(int i =0; i<10;i++){

		printf("%f\n",h_in[i].x);

	} 

	cudaFree(dev_frame);

	cudaFree(dev_fft);

	cudaFree(dev_k_resampledspacing);

	cudaFree(dev_resample);

	cudaFree(dev_b);

	cudaFree(dev_resamp);

	cudaFree(d_in);

	free(h_in);

}

I just saw the mistake and changed the code into this way. Still the values getting out is 0 and same error

The thread ‘CUDA Default Context’ (0x0) has exited with code 0 (0x0).

The thread ‘’ (0xb36970) has exited with code 0 (0x0).

The program ‘[3964] Resampling2.exe: CUDA’ has exited with code 0 (0x0).

is appearing. Im using 1000 threads (XDIM is 1000). Does GPU exit the program with 0 when thread number is greater

than it can handle? Thank you

alrikai · August 17, 2011, 4:09pm

__global__ void doResample(int* a, cuComplex* d, float* f){ 

/* 

   a = dev_frame   b = dev_fft   c = dev_resample

   d = d_in        e = k_resampledspacing 

   f = dev_b       g = dev_resamp 

*/

	int tid = threadIdx.x;

	int b = *a;

	if(tid < XDIM)

	{

		for(int y =0; y<YDIM;y++)

		{

			d[y].x = f[b*YDIM*XDIM+tid*YDIM+y];

			d[y].y = 0;

		}			

	}	

}

int main(int argc, char* argv[]) {

	int frame = 0;

	readData( );

	dc_subtract();

	float *cpu_b;

	cpu_b = (float *)malloc(sizeof(float)*FRAMES*XDIM*YDIM);

	h_in = (cuComplex *)malloc(sizeof(cuComplex)*YDIM);

	HANDLE_ERROR( cudaMalloc((void**)&dev_frame, sizeof(int)));

	HANDLE_ERROR( cudaMalloc((void**)&dev_b, sizeof(float)*FRAMES*XDIM*YDIM));

	HANDLE_ERROR( cudaMalloc((void**)&d_in,sizeof(cuComplex)*YDIM));

		

//	Allocation of Host memory (if goes to main becomes faster than this version)

//	still having black screen which have no idea

	HANDLE_ERROR(cudaMemcpy(dev_b, b_1D, sizeof(float)*FRAMES*XDIM*YDIM, cudaMemcpyHostToDevice));

	HANDLE_ERROR(cudaMemcpy(dev_frame, &frame, sizeof(int), cudaMemcpyHostToDevice));

	cudaMemcpy(cpu_b, dev_b, sizeof(float)*FRAMES*XDIM*YDIM, cudaMemcpyDeviceToHost);

	for(int i=0;i<YDIM;i++){

		printf("%f\n",cpu_b[19*XDIM*YDIM+999*YDIM+i]);

	}

	cudaThreadSynchronize();

	doResample<<<1, XDIM>>>(dev_frame, d_in, dev_b);

	cudaThreadSynchronize();

	HANDLE_ERROR(cudaMemcpy(h_in, d_in, sizeof(cuComplex)*YDIM, cudaMemcpyDeviceToHost));

	cudaThreadSynchronize();

	for(int i =0; i<10;i++){

		printf("%f\n",h_in[i].x);

	} 

	cudaFree(dev_frame);

	cudaFree(dev_fft);

	cudaFree(dev_k_resampledspacing);

	cudaFree(dev_resample);

	cudaFree(dev_b);

	cudaFree(dev_resamp);

	cudaFree(d_in);

	free(h_in);

}

I just saw the mistake and changed the code into this way. Still the values getting out is 0 and same error

The thread ‘CUDA Default Context’ (0x0) has exited with code 0 (0x0).

The thread ‘’ (0xb36970) has exited with code 0 (0x0).

The program ‘[3964] Resampling2.exe: CUDA’ has exited with code 0 (0x0).

is appearing. Im using 1000 threads (XDIM is 1000). Does GPU exit the program with 0 when thread number is greater

than it can handle? Thank you

If you run the Device Query SDK program, what does it say your max thread dimensions are? (It’s probably 512 9800/9500). Instead of having 1000 threads just in the x-dimension, why not split it up to use both the x and y dimensions? Or have multiple blocks?

MasterKitten · August 18, 2011, 6:49pm

Thanks that worked beautiful !!! Thank you

Topic		Replies	Views
strange error in summation memory problems CUDA Programming and Performance	14	16649	July 21, 2010
Optimizing performance of a serial <<<1, 1>>> kernel, after long debugging hours CUDA Programming and Performance	13	913	July 2, 2018
Kernel doesn't return correct values but in emulation mode it does CUDA Programming and Performance	24	3221	July 25, 2010
Problem loading data to shared memory Legacy PGI Compilers	12	6752	December 24, 2010
multi dimension array CUDA Programming and Performance	26	32828	February 12, 2010
Device Memeroy allocation and data transfer Data transfer between host and device CUDA Programming and Performance	5	2586	June 16, 2011
Very strange behaviour. Maybe a bug...? Kernel fails to run strangely, but no errors are reported. CUDA Programming and Performance	5	1055	May 13, 2009
Are there memory limitations on Device when using large arrays? Tesla C1060 CUDA Programming and Performance	40	14892	April 22, 2009
Memory Corruption on a Fermi-Class GPU Error only on Fermis, program works on non-Fermis. CUDA Programming and Performance	18	7155	July 22, 2011
Strange problem with kernel launch CUDA Programming and Performance	13	1893	October 21, 2010

Mixing Types

Related topics