[Solved]Physical address of Texture memory

texture.cu below is a typical example of texture memory usage.

  • Question 1
  • Before we copy memory from host memory to device memory (Global memory), we need to implement texture memory binding operations.

    cudaBindTexture( NULL, tex_a, d_a, bytes );
    

    My Question is that what does cudaBindTexture exactly do here? Does it involve memory copying from global memory to texture memory

  • Question 2
  • When i fetch data from texture memory, what is the "Real physical address " of tex_a? Is it still equal to the address of global memory?
    c[id] = tex1Dfetch(tex_a,id) + tex1Dfetch(tex_b,id);
    
  • texture.cu
  • nvcc -o texture texture.cu
    
    #include <stdio.h>
    #include <stdlib.h>
    #include <math.h>
    #ifndef N
    	#define N (1024)
    #endif
    texture<float> tex_a;
    texture<float> tex_b;
    texture<float> tex_c;
    
    // CUDA kernel. Each thread takes care of one element of c
    __global__ void vecAdd(float *c)
    {
        // Get our global thread ID
        int id = blockIdx.x*blockDim.x+threadIdx.x;
        // Make sure we do not go out of bounds
       // if (id < N) {
    		
    		c[id] = tex1Dfetch(tex_a,id) + tex1Dfetch(tex_b,id);
    //	}
            
    }
    
    int main( int argc, char* argv[] )
    {
        // Size of vectors
        //int n = 10000;
    	
        // Host input vectors
        float *h_a;
        float *h_b;
        //Host output vector
        float *h_c;
    	
        // Device input vectors
        float *d_a;
        float *d_b;
        //Device output vector
        float *d_c;
    	
        // Size, in bytes, of each vector
        size_t bytes = N*sizeof(float);
    	
        // Allocate memory for each vector on host
        h_a = (float*)malloc(bytes);
        h_b = (float*)malloc(bytes);
        h_c = (float*)malloc(bytes);
    	// Allocate memory for each vector on GPU
        cudaMalloc(&d_a, bytes);
        cudaMalloc(&d_b, bytes);
        cudaMalloc(&d_c, bytes);
    	
        int i;
        // Initialize vectors on host
        for( i = 0; i < N; i++ ) {
            h_a[i] = sin(i)*sin(i);
            h_b[i] = cos(i)*cos(i);
    		//h_c[i] = 0.0f;
        }
    	// bind to texture memory
    	cudaBindTexture( NULL, tex_a,
    					 d_a,
    					 bytes );
    	cudaBindTexture( NULL, tex_b,
    					 d_b,
    					 bytes );
    	cudaBindTexture( NULL, tex_c,
    					 d_c,
    					 bytes );
        // Copy host vectors to device
        cudaMemcpy( d_a, h_a, bytes, cudaMemcpyHostToDevice);
        cudaMemcpy( d_b, h_b, bytes, cudaMemcpyHostToDevice);
    	
    	
        int blockSize, gridSize;
    	
        // Number of threads in each thread block
        blockSize = 1024;
    	
        // Number of thread blocks in grid
        gridSize = (int)ceil((float)N/blockSize);
    	
        // Execute the kernel
        vecAdd<<<gridSize, blockSize>>>(d_c);
    	
        // Copy array back to host
        cudaMemcpy( h_c, d_c, bytes, cudaMemcpyDeviceToHost );
    	
        // Sum up vector c and print result divided by n, this should equal 1 within error
        float sum = 0;
        for(i=0; i<N; i++)
            sum += h_c[i];
        printf("final result: %f\n", sum/N);
    	
        // Release device memory
        cudaFree(d_a);
        cudaFree(d_b);
        cudaFree(d_c);
    	
        // Release host memory
        free(h_a);
        free(h_b);
        free(h_c);
    	return 0;
    }
    

    There seems to be a conceptual misunderstanding here, as these two operations are orthogonal. One performs a texture binding operation on a piece of global memory if the intention is that the code subsequently accesses this global memory via texture functions (to allow for automatic interpolation on data retrieval, for example). How that global memory was initialized prior to the binding operation is entirely up to the programmer.

    It could have been initialized via a host-side memset or memcpy operation, or the global memory could have been written to by the current or a previous kernel.

    Note that the binding (as well as possibly unbinding and rebinding) of textures to a piece of global memory typically occurs after initialization of that chunk of global memory, not prior to it.

    afaik, the texture object is just a small descriptor object of about 16-64 bytes. it contains address of the memory buffer as well as other parameters required for full-featured textures (such as clipping mode and approximation mode). tex1Dfetch just ignores most of these params.

    so, no data are copied. moreover, there is no special “texture memory”, but tex* operations caches data via texture cache, that is also accessible through the LDG operation on kepler+ gpus.

    Thanks your answer is correct. and i also find some more information on it.
    http://stackoverflow.com/questions/12340265/what-is-the-size-of-my-cuda-texture-memory/12341233#12341233