Textures, cudaArrays and RGBA channels filling, binding, and reading back Textures and cudaArrays

Hi everybody! first of all, sorry if this topic is (or has been) posted in other forum, but the fact is that I haven’t found this required information.

I kind of understand how to do things like filling, binding and reading back data stored in textures and/or cudaArrays, but I just can’t understand how to work with these data structures in RGBA channels.

There is a simple code that fills, binds and read back some data:

#include <stdlib.h>

#include <stdio.h>

#define width	16

#define height	16

// declare texture reference for 2D float texture

texture<float, 2, cudaReadModeElementType> tex;

__global__ void TextureKernel(float* data){

	// calculate texture coordinates

	unsigned int x = (blockIdx.x*blockDim.x + threadIdx.x);

	unsigned int y = (blockIdx.y*blockDim.y + threadIdx.y);

	// read from texture and write to global memory

	data[y * width + x] = tex2D(tex, x + 0.5f, y + 0.5f);

}

int main(int argc, char **argv){

	int i, j;

	size_t size = width * height * sizeof(float);

	

	//host input and output data

	float* h_data = (float*) malloc(size);

	float* result_data = (float*) malloc(size);

	// allocate device memory for result

	float* d_data = NULL;

	cudaMalloc((void**) &d_data, size);

	

	for(i=0;i<height;i++)

		for(j=0;j<width;j++)

			h_data[i*width + j] = (float) (i*width + j);

			

	// allocate array and copy input data

	cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);

	cudaArray* cu_array;

	cudaMallocArray( &cu_array, &channelDesc, width, height); 

	cudaMemcpyToArray( cu_array, 0, 0, h_data, size, cudaMemcpyHostToDevice);

	// set texture parameters

	tex.addressMode[0] = cudaAddressModeWrap;

	tex.addressMode[1] = cudaAddressModeWrap;

	tex.filterMode = cudaFilterModeLinear;

	tex.normalized = false;	// access with normalized texture coordinates

	

	// Bind the array to the texture

	cudaBindTextureToArray( tex, cu_array, channelDesc);

	

	dim3 dimBlock(8, 8);

	dim3 dimGrid(2,2);

	// warmup

	TextureKernel<<< dimGrid, dimBlock >>>(d_data);

	

	cudaMemcpy(result_data, d_data, size, cudaMemcpyDeviceToHost);

	

	for(i=0;i<height;i++){

		for(j=0;j<width;j++)

			printf("%f ", result_data[i*width + j]);

		printf("\n");

	}

	

	//printf("Hola Mundo!\n");

}

But all these tings are done working only in one channel, I understand that when we create a ChannelDescription we can do something like this:

cudaChannelFormatDesc  = cudaCreateChannelDesc(rBits, gBits, bBits, aBits, datatype);

where Bits, gBits, bBits and aBits correspond to the number of bits used to store data in each RGBA channel respectively, in my code there is

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);

which means (I guess )that I’m only using Red channel.

[b]Also I understand that cudaArrays are composed by elements each of which has 1, 2 or 4 components, I gess… one component per channel, (TELL ME IF I’M WRONG!!).

The question is…, if I make something like these…

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 32, 32, 32, cudaChannelFormatKindFloat);

How should I allocate array, copy my INPUT_DATA?, and after that, how can I read back the output data???, I mean, how can I copy data to my cudaArray to store data on each RGBA components??? and then, after binding my Texture to my cudaArray, how can I access my stored data in RGBA channels from the texture in a computing kernel. I want to do something like my previous code does, but using all the RGBA channels.

[/b]

I hope i made myself easy to understand, and any help will be welcome, I am thankful beforehand!