Problem with texture memory

Hi all,

I am trying to use texture memory on an easy example but I have some troubles.

Here is my code

#include <stdio.h>

#include <stdlib.h>

extern texture<float, cudaTextureType1D, cudaReadModeElementType> tex;

/** Kernel function **/

__global__ void KernelTest(float* tab1_D, float* tab2_D)

{

	// Thread index

	int idx = threadIdx.x;

	

	// Global memory

	tab2_D[idx] = tab1_D[idx];

	

	// Reading texture memory

	//tab_D[idx] = tex1Dfetch(tex, idx);

}

void test(){

	

	/** Initialisation **/

	int N = 128;

	float* tab1_H;

	float* tab1_D;

	float* tab2_H;

	float* tab2_D;

	cudaError_t erreur;

	

	tab1_H = (float*) malloc( N*sizeof(*(tab1_H)));

	if( tab1_H == NULL ){

		printf("ERREUR: Problème de malloc de tab_H\n");

		exit(1);

	}

	

	tab2_H = (float*) malloc( N*sizeof(*(tab2_H)));

	if( tab2_H == NULL ){

		printf("ERREUR: Problème de malloc de tab_H\n");

		exit(1);

	}

	

	erreur =  cudaMalloc((void**)&(tab1_D), N*sizeof(*(tab1_D)));

	if( erreur != cudaSuccess ){

		printf("ERREUR: Problème de cudaMalloc de tab_D\n");

		exit(1);        

	}

	

	erreur =  cudaMalloc((void**)&(tab2_D), N*sizeof(*tab2_D));

	if( erreur != cudaSuccess ){

		printf("ERREUR: Problème de cudaMalloc de tab_D\n");

		exit(1);        

	}

	

	for( int i=0; i<N; i++ )

		tab1_H[i] = i;

	

	

	/** Sending of the tab in the device **/

	

	erreur = cudaMemcpy(tab1_D, tab1_H, N*sizeof(*(tab1_D)), cudaMemcpyHostToDevice);

	if( erreur != cudaSuccess ){

		printf( "ERREUR: Problème de copie tab_D\n");

		printf( "Nature de l'erreur: %s\n",cudaGetErrorString(erreur) );

		exit(1);

	}       

	

	/** Initialisation of the texture memory **/

	erreur = cudaBindTexture(NULL, tex, tab1_D, N*sizeof(float));

	if( erreur != cudaSuccess ){

		printf( "ERREUR: Problème de bindTexture dans test\n");

		printf( "Nature de l'erreur: %s\n",cudaGetErrorString(erreur) );

		exit(1);

	} 

	

	dim3 blockSize(N,1);

	KernelTest<<<1, blockSize>>>(tab1_D, tab2_D );

	

	erreur = cudaMemcpy(tab2_H, tab2_D, N*sizeof(*(tab2_D)), cudaMemcpyDeviceToHost);

	if( erreur != cudaSuccess ){

		printf( "ERREUR: Problème de copie tab_D\n");

		printf( "Nature de l'erreur: %s\n",cudaGetErrorString(erreur) );

		exit(1);

	}       

	

	/** Final result **/

	for( int i=0; i<N; i++ ){

		printf("tab1_H[%d] = %f\n",i,tab1_H[i]) ;

		printf("tab2_H[%d] = %f\n",i,tab2_H[i]) ;

	}

	

	cudaUnbindTexture(tex);

	cudaFree(tab1_D);

	cudaFree(tab2_D);

	free(tab1_H);

	free(tab2_H);

}

int main(){

	test();

}

As a result, I have all value of tab2_D egal to zero.

EDIT: I HAD. Magic??

Do you have any idea how to correct my code?

Thanks.

I can’t see anything wrong with the code you give, but there are a few parts missing to make an executable program.
How is [font=“Courier New”]tex[/font] defined? How are you combining host.cu and device.cu into one file?

I am sorry for the incomplete previous post. I changed it. I created a single new file with only the interested code and now it works. I don’t know why. It stills incomprehensible to me.

Texture declarations in different compilation units refer to different objects even if they have the same name.

Ok and thanks. I will be careful with it.

With my first tests, I think that the usual way of using global memory to read the data (I mean only do a cudaMalloc) is faster. I thought texture memory was the faster one in read-only. Is that always true?

To test the fastest way I switch between comment Global Memory way or texture memory way in the kernel and then check the execution time by calling “time ./MyProgramme”. Is it the correct way to do?

The L1 and L2 cache are faster than the texture so it will depend on problem. In some random access patterns texture might give speed up.

I am doing some test to find the fastest way, thanks pasoleatis for your remarks.

But in my test, all the values of tex are equal to zero. In fact I am working on a big project and I need to use the same texture in two different files. The first one where I make cudaBindTexture() (in host.cu for host functions) and the other one where I read the texture (in device.cu for device functions). So I defined texture in a file common.h such as

extern texture<float, cudaTextureType1D, cudaReadModeElementType> tex;

and I included common.h in both host.cu and device.cu.

I guess the problem come from the compilation units that tera mentionned. But I don’t know how to figure out this problem. Do you have some sugestions that may help me?

I hope to be clear enough, ask for more explanation if not.

There is no way around it - they just have to be in the same compilation unit.

I checked a little bit on internet about compilation unit but I don’t know exactly how it works. So in other terms, the declaration and the call have to be at least in the same file.

Thanks again for your answer