Device Function Pointers and Texture References Will pointers to fetch functions stay the same?

All right, so I managed to figure out a way to mostly abstract texture fetches. I know that it works and I have tested it extensively.

The method that I use to do this revolves around device function pointers, cudaMemcpyFromSymbol, and a bunch of texture references and their fetch functions declared in a header file.

The top level of the code where the texture references and the fetch functions are declared looks like this:

int nextTex = 0;

texture<float,cudaTextureType2D,cudaReadModeElementType> texref0;

texture<float,cudaTextureType2D,cudaReadModeElementType> texref1;

texture<float,cudaTextureType2D,cudaReadModeElementType> texref2;

texture<float,cudaTextureType2D,cudaReadModeElementType> texref3;

texture<float,cudaTextureType2D,cudaReadModeElementType> texref4;

texture<float,cudaTextureType2D,cudaReadModeElementType> texref5;

texture<float,cudaTextureType2D,cudaReadModeElementType> texref6;

typedef float (*pt2Func)(float,float);


float fetchtexref0(float x,float y)


	return tex2D(texref0,x,y);



float fetchtexref1(float x,float y)


	return tex2D(texref1,x,y);



float fetchtexref2(float x,float y)


	return tex2D(texref2,x,y);



float fetchtexref3(float x,float y)


	return tex2D(texref3,x,y);



float fetchtexref4(float x,float y)


	return tex2D(texref4,x,y);



float fetchtexref5(float x,float y)


	return tex2D(texref5,x,y);



float fetchtexref6(float x,float y)


	return tex2D(texref6,x,y);


__device__ pt2Func fetchPointer0 = 	&fetchtexref0;

__device__ pt2Func fetchPointer1 = 	&fetchtexref1;

__device__ pt2Func fetchPointer2 = 	&fetchtexref2;

__device__ pt2Func fetchPointer3 = 	&fetchtexref3;

__device__ pt2Func fetchPointer4 = 	&fetchtexref4;

__device__ pt2Func fetchPointer5 = 	&fetchtexref5;

__device__ pt2Func fetchPointer6 = 	&fetchtexref6;

A ‘grid’ object can be created and used to manage accesses to a particular texture reference simply by assigning a function pointer as one of its members. Texture fetches can be easily carried out by using the associated function pointer to perform the texture fetch. The naming structure and copy to/from symbol makes setting these things up pretty easy. Every time a new ‘grid’ object is allocated it gets uses the ‘nextTex’ global variable to figure out the number of the next available texture reference. Once it binds that texture to its own cudaArray it increments the ‘nextTex’ counter so that the next ‘grid’ object will take the next reference in line. Its a bit clunky, but a lot easier to toss around than trying to manage about 50+ texture references that I am using in my main code.

This is how a ‘grid’ object is assigned a texture reference and how it binds its data to that reference:


	void fill2D(cudaMatrixf data_in) // cudaMatrixf is just a front end for 3D device arrays


		int nx = 1;

		int ny = 1;

		char* texrefstring = (char*)malloc(sizeof(char)*25);

		char* texfetchstring = (char*)malloc(sizeof(char)*25);

		int itemp = nextTex;



		printf("%s \n",texrefstring); // This line was just for debugging


		printf("%s \n",texfetchstring); // This line was just for debugging

		symbol = texrefstring;


		nx = griddims[0];

		ny = griddims[1];

		cudaError status;

		cudaExtent extent = make_cudaExtent(nx,ny,0);

		cudaMemcpy3DParms params = {0};

		params.kind = cudaMemcpyDeviceToDevice;

		cudaChannelFormatDesc desc = cudaCreateChannelDesc<float>();

		status = cudaMalloc3DArray(&cuArray,&desc,extent);

		params.srcPtr = data_in.getptr();

		params.dstArray = cuArray;

		params.extent = make_cudaExtent(nx,ny,1);




		const textureReference* texRefPtr;

		cudaGetTextureReference(&texRefPtr, symbol);

		printf("%s \n",symbol);

		cudaChannelFormatDesc channelDesc;

		cudaGetChannelDesc(&channelDesc, cuArray);

		CUDA_SAFE_CALL(cudaBindTextureToArray(texRefPtr, cuArray, &channelDesc));


I’ve checked this code pretty extensively and it works fine when the declaration, assignment, and access all take place within the same header file chain, but I’m not sure what will happen when I assign in one file, and then access the data in a different linked file.

Now, on to my main question: If I setup my ‘grids’ in one .cu file, , store them all as an external variable, and then launch a separate .cu file,, that does a bunch of work using the ‘grids’ setup in setup_fields, will the function pointers still point to my original texture fetch functions? Both of the .cu files use the same header files, and contain the texture reference declarations. Will the compiler generate 2 sets of texture references, one for each .cu file, and hence my function pointers will point to fetches of the wrong textures, or will it all be okay? Note both of these .cu files are being combined into a single library.

I’m thinking that I need to declare all of the texture references as ‘extern’ and possibly all of the fetch functions as well. I’m just not entirely sure how the compiler will handle the declaration of the references in two separate instances that are going to be combined into a single library.

If anyone can straight up tell me what will happen that would be great. If not I’m going to try to look at the ptx output and try to determine if it is giving me the same functions for each of the texture fetch functions. I’ll also just check what happens when I access it from my second, linked file.

Either way I’ll post my results here. If you would like a copy of the full source code just let me know and I’ll upload a copy of it.

Texture references are implicitly declared static. Thus, the “same” references in two different compilation units are different. The cudaArray may be shared across compilation units, however. Binding is cheap, so the typical solution to this problem is to handle cudaArrays in your main code and bind the textures needed for a kernel call just prior to making that call.

Ok, so basically as long as I pass the cudaArray and my reference# across the compilation units then all will be fine. I just have to make sure that those cudaArray’s are rebound to their textures prior to actually using them. Cool.

Well it turns out that using multiple compilation units in my code would be more trouble than it is worth anyway due to the lack of a link stage for device code. Which means that is just going to be easier if I have everything in one big header file chain. Which isn’t pretty considering the code is on the order of 10,000+ lines of code.