tex1D and tex1Dfetch to C

I have CUDA code and I want to transform it to one-threaded C code, but I have encountered the following problem:

//force_table is a texture float4

float rsqrtfr2=rsqrtf(r2);
float4 fi = tex1D(force_table, rsqrtfr2);

How can I transform this bit of code to C?

Thank you.

In the general case, it will have some dependence on how the texture is set up for access, which you haven’t shown.

You could start with the documentation:


#define FORCE_TABLE_SIZE 4096
texture<float4, 1, cudaReadModeElementType> force_table;

void cuda_bind_force_table(const float4 *t, const float4 *et) {
    static __thread cudaArray *ct;
    if ( ! ct ) {
      cudaMallocArray(&ct, &force_table.channelDesc, FORCE_TABLE_SIZE, 1);
      cuda_errcheck("allocating force table");
    cudaMemcpyToArray(ct, 0, 0, t, FORCE_TABLE_SIZE*sizeof(float4), cudaMemcpyHostToDevice);
    // cudaMemcpy(ct, t, FORCE_TABLE_SIZE*sizeof(float4), cudaMemcpyHostToDevice);
    cuda_errcheck("memcpy to force table");
    force_table.normalized = true;
    force_table.addressMode[0] = cudaAddressModeClamp;
    force_table.addressMode[1] = cudaAddressModeClamp;
    force_table.filterMode = cudaFilterModeLinear;

    cudaBindTextureToArray(force_table, ct);
    cuda_errcheck("binding force table to texture");


txbob thank you for your time.
The documentation gave me some help but I don’t know how to transform the tex1D function to C as it isn’t clear to me how it works.

Thank you.

The interesting bits here are: (1) The value returned for out-of-bounds accesses is determined by the clamp-to-edge mode (2) The filter mode is set to linear.

You can easily mimic this in your discrete C emulation on the CPU. If the input to the table lookup is to small, you return the value for the smallest supported input. If the input to the table lookup is too big, you return the value for the largest supported input. That is the clamp-to-edge behavior. Since the inputs apparently don’t always fall exactly on table elements, you would use linear interpolation if the input falls between two consecutive table entries.

The built-in hardware interpolation of the GPU’s texture unit uses a 1.8 fixed-point format as described in documentation. If you need bit-wise identical results, you would have to emulate that. However, since the hardware interpolation is a low-accuracy interpolation, there is probably no harm in using a more accurate linear interpolation based on regular ‘float’ (single-precision floating-point) operations. Obviously, this may cause final results to differ slightly between CPU and GPU implementations.