Mixing Types

Hello Im using 9800GT and 9500GT on Windows. Im just curious why mixing types when calling kernel actually crushes
the function. I mean if I call

kernel(int, int, float,…) like this
somehow on my computer the values calculated become useless eventhough they are deterministic. Can somebody help me with this?
Thank you.

I’m not sure I follow your question; you mean that whenever you call a kernel with more than 1 data type in the parameter list, it’ll crash? Are your parameters in device memory?

Yes. My kernel crashes whenever there is more than 1 data type. I also set my parameters to be in device memory but still it is not working.

It gives a deterministic values which are wrong. Does anybody recognize the problem? Thank you.

Regards,

Jaehong Yoon

The problem most likely is elsewhere. Please post a complete, self-contained example so that we can help with it.

__global__ void doResample( int* a, cuComplex* d, float* e, int* f, float* g ){

		

/* 

   a = dev_frame   b = dev_fft   c = dev_resample

   d = d_in        e = k_resampledspacing 

   f = dev_b       g = dev_resamp 

*/

	int xx = threadIdx.x;

	

//General stucture of kernel for parallel computing

//Do computing while xx and yy are smaller than XDIM and YDIM and boolean

//dev_resample and dev_fft are both true.

	if(xx < XDIM)

	{

		for(int yy =0; yy<YDIM;yy++)

		{

			int i = int(g[yy*2]);

			d[yy].x = f[a[0]*YDIM*XDIM+xx*YDIM+i]-g[yy*2+1]*(float)(f[a[0]*YDIM*XDIM+xx*YDIM+i+1]

			-f[a[0]*YDIM*XDIM+xx*YDIM+i])/(float)e[0];

			d[yy].y = 0;

		}			

	}

	

}

When I change a, and f into float type it works perfect. Does anybody know what this happens? Also Im not sure why I have to cast

all variable that are float type in first place in order to make the code work right. Thank you

When you change a and f to float, do you also change the calling code? Since a and f are pointers, obviously the kernel and it’s caller need to agree what they point to.

Yes I do cast (int) in front of a whenever I call it. I also cast (float) in front of f which does not make sense to me but somehow makes the code work as it should be.

Casting a pointer to a different type will not change the type of the data it points to (i.e., the binary data remains unchanged), it will only change how the code interprets this binary data. Thus casting a float* to an int* will not work.

How do you call the kernel?

__global__ void doResample( float* a, cuComplex* d, float* f){ 

	int tid = threadIdx.x;

	if(tid < XDIM)

	{

		for(int yy =0; yy<YDIM;yy++)

		{

			d[yy].x = f[(int)a[0]*YDIM*XDIM+tid*YDIM+yy];

			d[yy].y = 0;

		}			

	}	

}

I actually changed the code as above and call kernel as

doResample<<<1, XDIM>>>(values…)

now Im getting bunch of 0s. Size of data Im transfering is 1024100020 (for f) and size of d is 1024 in float2. Can size of

such array cause error? Thank you

When I do CPU debugging it says

irst-chance exception at 0x75999673 in Resampling2.exe: Microsoft C++ exception: [rethrow] at memory location 0x00000000…
The program ‘[2604] Resampling2.exe: Native’ has exited with code 0 (0x0).

and when I try to use Parallel Nsight it says

The thread ‘CUDA Default Context’ (0x0) has exited with code 0 (0x0).
The thread ‘’ (0xcb6970) has exited with code 0 (0x0).
The program ‘[3276] Resampling2.exe: CUDA’ has exited with code 0 (0x0).

Does this mean anything? Thank you

Dunno if I’m telling you stuff you already know, but

memory location 0x00000000 --> null pointer, and the Parallel Nsight messages are just telling you your program exited (and returned 0)

The part I marked in bold was what my question was about. I strongly suspect that the types of your arguments do not match the types of your kernel, and you are just hiding this with an incompatible type cast.

Bellow is the main code

int main(int argc, char* argv[]) {

	int frame = 0; float f_frame;

	f_frame = (float)frame;

	readData( );

	dc_subtract();

	h_in = (cuComplex *)malloc(sizeof(cuComplex)*YDIM);

	HANDLE_ERROR( cudaMalloc((void**)&dev_frame, sizeof(float)));

	HANDLE_ERROR( cudaMalloc((void**)&dev_b, sizeof(float)*FRAMES*XDIM*YDIM));

	HANDLE_ERROR( cudaMalloc((void**)&d_in,sizeof(cuComplex)*YDIM));

		

	HANDLE_ERROR(cudaMemcpy(dev_b, b_1D, sizeof(float)*FRAMES*XDIM*YDIM, cudaMemcpyHostToDevice));

	HANDLE_ERROR(cudaMemcpy(dev_frame, &frame, sizeof(float), cudaMemcpyHostToDevice));

	cudaThreadSynchronize();

	doResample<<<1, XDIM>>>(dev_frame, d_in, dev_b);

	

	HANDLE_ERROR(cudaMemcpy(h_in, d_in, sizeof(cuComplex)*YDIM, cudaMemcpyDeviceToHost));

	cudaThreadSynchronize();

	for(int i =0; i<10;i++){

		printf("%f\n",h_in[i].x);

	} 

	cudaFree(dev_frame);

	cudaFree(dev_fft);

	cudaFree(dev_k_resampledspacing);

	cudaFree(dev_resample);

	cudaFree(dev_b);

	cudaFree(dev_resamp);

	cudaFree(d_in);

	free(h_in);

}

Im not sure if the types are wrong… And about the null point what does that mean does that mean the program is just shutting down without

doing anything? Thank you

You managed to again leave out the declarations of many of the relevant variables. However, from the parts you posted I can already see that you are mixing different pointer types: [font=“Courier New”]frame[/font] is of type integer, copied into [font=“Courier New”]dev_frame[/font] of unknown type but size [font=“Courier New”]sizeof(float)[/font] then used as a float kernel parameter, whose content is later cast back to integer. None of these casts is necessary. Copying an int into a float* is just wrong and cannot be undone by casting the resulting float back to int.

__global__ void doResample(int* a, cuComplex* d, float* f){ 

/* 

   a = dev_frame   b = dev_fft   c = dev_resample

   d = d_in        e = k_resampledspacing 

   f = dev_b       g = dev_resamp 

*/

	int tid = threadIdx.x;

	int b = *a;

	if(tid < XDIM)

	{

		for(int y =0; y<YDIM;y++)

		{

			d[y].x = f[b*YDIM*XDIM+tid*YDIM+y];

			d[y].y = 0;

		}			

	}	

}

int main(int argc, char* argv[]) {

	int frame = 0;

	readData( );

	dc_subtract();

	float *cpu_b;

	cpu_b = (float *)malloc(sizeof(float)*FRAMES*XDIM*YDIM);

	h_in = (cuComplex *)malloc(sizeof(cuComplex)*YDIM);

	HANDLE_ERROR( cudaMalloc((void**)&dev_frame, sizeof(int)));

	HANDLE_ERROR( cudaMalloc((void**)&dev_b, sizeof(float)*FRAMES*XDIM*YDIM));

	HANDLE_ERROR( cudaMalloc((void**)&d_in,sizeof(cuComplex)*YDIM));

		

//	Allocation of Host memory (if goes to main becomes faster than this version)

//	still having black screen which have no idea

	HANDLE_ERROR(cudaMemcpy(dev_b, b_1D, sizeof(float)*FRAMES*XDIM*YDIM, cudaMemcpyHostToDevice));

	HANDLE_ERROR(cudaMemcpy(dev_frame, &frame, sizeof(int), cudaMemcpyHostToDevice));

	cudaMemcpy(cpu_b, dev_b, sizeof(float)*FRAMES*XDIM*YDIM, cudaMemcpyDeviceToHost);

	for(int i=0;i<YDIM;i++){

		printf("%f\n",cpu_b[19*XDIM*YDIM+999*YDIM+i]);

	}

	cudaThreadSynchronize();

	doResample<<<1, XDIM>>>(dev_frame, d_in, dev_b);

	cudaThreadSynchronize();

	HANDLE_ERROR(cudaMemcpy(h_in, d_in, sizeof(cuComplex)*YDIM, cudaMemcpyDeviceToHost));

	cudaThreadSynchronize();

	for(int i =0; i<10;i++){

		printf("%f\n",h_in[i].x);

	} 

	cudaFree(dev_frame);

	cudaFree(dev_fft);

	cudaFree(dev_k_resampledspacing);

	cudaFree(dev_resample);

	cudaFree(dev_b);

	cudaFree(dev_resamp);

	cudaFree(d_in);

	free(h_in);

}

I just saw the mistake and changed the code into this way. Still the values getting out is 0 and same error

The thread ‘CUDA Default Context’ (0x0) has exited with code 0 (0x0).

The thread ‘’ (0xb36970) has exited with code 0 (0x0).

The program ‘[3964] Resampling2.exe: CUDA’ has exited with code 0 (0x0).

is appearing. Im using 1000 threads (XDIM is 1000). Does GPU exit the program with 0 when thread number is greater

than it can handle? Thank you

If you run the Device Query SDK program, what does it say your max thread dimensions are? (It’s probably 512 9800/9500). Instead of having 1000 threads just in the x-dimension, why not split it up to use both the x and y dimensions? Or have multiple blocks?

Thanks that worked beautiful !!! Thank you