OK, I’m looking to do some undistortion on some images using texture fetches. I capture the image with OpenCV to a RGBA IplImage data structure, the data is stored in as a char*.
I then copy the image date to the device, where imData is the char* to the IplImage, RES_H and RES_V are the horizontal and vertical resolution, and POINTS is RES_HRES_V4 (where 4 is the number of channels). I then create a texture binding.
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(8, 8, 8, 8, cudaChannelFormatKindSigned);
cudaArray* cu_array;
CUDA_SAFE_CALL(cudaMallocArray( &cu_array, &channelDesc, RES_H, RES_V ));
CUDA_SAFE_CALL( cudaMemcpyToArray( cu_array, 0, 0, imData, POINTS, cudaMemcpyHostToDevice));
// set texture parameters
tex_Image.addressMode[0] = cudaAddressModeClamp;
tex_Image.addressMode[1] = cudaAddressModeClamp;
tex_Image.filterMode = cudaFilterModePoint;
tex_Image.normalized = false;
// Bind the array to the texture
CUDA_SAFE_CALL( cudaBindTextureToArray( tex_Image, cu_array, channelDesc));
My kernel follows:
texture <char4, 2, cudaReadModeElementType> tex_Image;
__global__ void
cuUndistort(char* imData,float* Mapx,float* Mapy,int* RES_H, int* RES_V, char4* ResultData)
{
int x = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;
int y = __mul24(blockIdx.y, blockDim.y) + threadIdx.y;
int loc = __mul24(y, RES_H[0]) + x;
if (x < RES_H[0] && y < RES_V[0])
{
ResultData[loc] = tex2D(tex_Image, Mapx[loc], Mapy[loc]);
}
}
This all works just fine, but my problem is with using char4. I need to copy the resulting data back into a interleaved char* which alone takes ~15ms. I tried allocating char types instead of char4 and using a char texture reference but the image didn’t come out right. I then still have the problem on not being able to use cudaFilterModeLinear as my filter mode.
Does anybody have any suggestion on how to go from my char* image and use a texture reference that can use bi-linear interpolation and still be able to transfer to the host as a char*?
I would also like to be able to do 32bit memory read/writes as I heard this is much faster than 8bit.
Thanks for your time.