type conversion (float -> uint16) and outputting to MATLAB


I have started CUDA programming two weeks ago, mainly writing a mex file to speed up a MATLAB code. I have successfully processed my data using float, and now I want to return the result as uint16 to MATLAB. However, I can’t seem to get the type conversion right, and the result as seemed in MATLAB are all garbage.

My function is as follow:

[codebox]global void scale(unsigned int* odata, float* idata, float factor, unsigned int Size)


int i = blockIdx.x * blockDim.x + threadIdx.x;

if(threadIdx.x < Size)

	odata[i] = __float2uint_rn(idata[i] / factor * 65535 /2);


And my host code is as follow:

[codebox] /* declare variables */

unsigned int* d_VTD;

imageSize = LENGTH/2 * COL;

/* Allocate memory */

cudaMalloc( (void**) &d_VTD, imageSize*sizeof(unsigned int));

out_dimensions[0] = Step;

out_dimensions[1] = COL;

plhs[2] = mxCreateNumericArray(2, out_dimensions, mxUINT16_CLASS, mxREAL);

unsigned char* h_VTD = (unsigned char*) mxGetPr(plhs[2]);

/* scale and cast */

numThreads = 512;

numBlocks = COL;

scale<<<numBlocks, numThreads>>>(d_VTD, d_Vtemp, factor, Step);

int bytes_to_copy = Step * COL * mxGetElementSize(plhs[2]);

cudaMemcpy(h_VTD, d_VTD, bytes_to_copy, cudaMemcpyDeviceToHost);[/codebox]

d_Vtemp is an array of floats.

The MATLAB parts of the code is copied from the mex file programming guide.

I am not sure if I have used the right types for declaring the d_VTD.

Does anyone know what might the problem be? Or is there a better way to do type conversation/casting from float to uint16?

Just to give an example of how bad the result looks like, I have attached a jpg file containing two images. The top image is the correct image whereas the bottom is the jumbled image.

Any help will be much appreciated.

Thank you,


unsigned int is normally 32 bits, uint16 is 16 bits…

The easiest solution is to copy the data as floats to Matlab, and then simply use myvariable = uint16(myvariable) in Matlab

Is there no easy way to do this in CUDA? I really want the code to be as independent as MATLAB as possible.

I’m not sure if it is efficient to do it in a kernel, but

(unsigned short int) my_value

should do it. Using

(unsigned int) my_value

will normally cast to uint32