Output of 2D texture memory is zero

moein.mfh · March 28, 2021, 8:46pm

Hi,
I’m trting to bind a pointer in my device global memory to a texture memory so i can do 2D interpolation. However, when i load from the texture memory, everything is zero.
here is my code.

 #include <cuda_runtime.h>
 #include "device_launch_parameters.h"
 #include <stdio.h>
 #include "cuda.h"

texture<float, 2, cudaReadModeElementType> tex;
 #define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
 inline void gpuAssert(cudaError_t code, const char* file, int line, bool abort = true)
 {
if (code != cudaSuccess)
{
    fprintf(stderr, "GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
    if (abort) exit(code);
  }
 }
 __global__ void kernel_ArrivalTimeCalculation(float* Device_ConvArrivalTime) {
int TID = threadIdx.y * blockDim.x + threadIdx.x;
int BlockOFFset = blockDim.x * blockDim.y * blockIdx.x;
int GID_RowBased = BlockOFFset + TID;
int RowOFFset = blockDim.x * blockDim.y * gridDim.x * blockIdx.y;
int GID = RowOFFset + BlockOFFset + TID;
Device_ConvArrivalTime[GID_RowBased] = (float)(GID/2);
  }
 __global__ void kernel_ArrivalTimeCalculation_Show()  {
   int TID = threadIdx.y * blockDim.x + threadIdx.x;
     int BlockOFFset = blockDim.x * blockDim.y * blockIdx.x;
     int GID_RowBased = BlockOFFset + TID;
     int  Px_man = (GID_RowBased) % (256);
      int Pz_man = (GID_RowBased) / (256);
      float value;
        value = tex2D(tex, (float)(Px_man/3.0),  (float)(Pz_man/3.0) );
        printf("The value in the texture memory: %.6f \n", value);
         }

   int main()
    {
float* Device_ConvArrivalTime;  // Device pointer
int m = 512*96;
int n = 256;
size_t pitch, tex_ofs;
gpuErrchk(cudaMallocPitch((void**)&Device_ConvArrivalTime, &pitch, n * sizeof(float), m));
tex.normalized = false;
tex.filterMode = cudaFilterModeLinear;
gpuErrchk(cudaBindTexture2D(&tex_ofs, &tex, Device_ConvArrivalTime, &tex.channelDesc, n, m, pitch));
dim3 block(1024, 1);
dim3 grid((m * n / block.x), 1);
kernel_ArrivalTimeCalculation << <grid, block>> > (Device_ConvArrivalTime);
cudaDeviceSynchronize();
kernel_ArrivalTimeCalculation_Show << <1, 256>> > ();
cudaDeviceSynchronize();
gpuErrchk(cudaFree(Device_ConvArrivalTime));
cudaDeviceReset();
return 0;}

So, the problem is that i get 0 printed. what could be wrong here?

Moein.

njuffa · March 28, 2021, 9:02pm

Textures are designed for read-only access. So you need to store data in a data object, bind the texture to that data object, then retrieve data from texture.

Here, the data underlying the texture tex is stored in Device_ConvArrivalTime. You need to initialize that data before you read it out via texture. I don’t see any code that performs that initialization.

There may be other bugs in your code, I did not study it in detail.

moein.mfh · March 29, 2021, 8:31am

Hi,
the “Device_ConvArrivalTime” is initialized in"kernel_ArrivalTimeCalculation " and then read out via texture in "kernel_ArrivalTimeCalculation_Show ". Is there something wrong?

njuffa · March 29, 2021, 8:44am

Seems I got confused between kernel_ArrivalTimeCalculation and kernel_ArrivalTimeCalculation_Show when perusing the code …

You might want to debug in two steps:

(1) After kernel_ArrivalTimeCalculation, read back the data in Device_ConvArrivalTime without using texture to make sure it is as you expect.

(2) When you read back the texture, check carefully whether (a) the indexes are in-range so you don’t hit a clamp-to-border case (b) the texture indexing matches the indexing mode selected (normalized vs unnormalized) (c) make sure you hit the middle of each texel.

FWIW, the division by three in your tex2D calls looks unusual / suspicious to me. There should be several examples of tex2D usage in these forums. I know I have posted a few over the years. It may be best to start with a known-good example and extend it.

Note that old-style texture references are currently deprecated and will likely disappear with the next major CUDA release. If you are starting with textures now, it’s probably best to work with texture objects from the very start. For a quick introduction, checkout this post:

https://developer.nvidia.com/blog/cuda-pro-tip-kepler-texture-objects-improve-performance-and-flexibility/

moein.mfh · March 29, 2021, 9:02am

yes, i’m starting to work with Texture memory. I have seen this post before. so, this is what is stated in this post:

  float *buffer;
  cudaMalloc(&buffer, N*sizeof(float));
   // create texture object
   cudaResourceDesc resDesc;
   memset(&resDesc, 0, sizeof(resDesc));
   resDesc.resType = cudaResourceTypeLinear;
   resDesc.res.linear.devPtr = buffer;
   resDesc.res.linear.desc.f = cudaChannelFormatKindFloat;
   resDesc.res.linear.desc.x = 32; // bits per channel
   resDesc.res.linear.sizeInBytes = N*sizeof(float);
   cudaTextureDesc texDesc;
    memset(&texDesc, 0, sizeof(texDesc));
   texDesc.readMode = cudaReadModeElementType;
    // create texture object: we only have to do this once!
    cudaTextureObject_t tex=0;
    cudaCreateTextureObject(&tex, &resDesc, &texDesc, NULL);

buffer is 1D, but my “Device_ConvArrivalTime” is 2D with m rows and n columns. could you please tell me what changes i need to apply to this object code? I have already used this object for 1D and it works fine, but never tried for a 2D case.

njuffa · March 29, 2021, 9:06am

I have not used textures in years; I would have to consult the documentation just like you. There may be a worked example among the sample apps that ship with CUDA.

moein.mfh · March 29, 2021, 9:06am

Can we not use the middle of texture? I’m saying this because I want to use the texture coordinates for normal addressing and if i want to do interpolation, then I would use the middle as well.

njuffa · March 29, 2021, 9:09am

For unnormalized texture coordinates, you would add 0.5 to hit the middle of each texel. This is orthogonal to whatever you do with interpolation. There are worked examples of 2D texture interpolation using texture references in these forums.

Here is some code I just scraped from my hard disk:

#include <stdlib.h>
#include <stdio.h>

// Macro to catch CUDA errors in CUDA runtime calls
#define CUDA_SAFE_CALL(call)                                          \
do {                                                                  \
    cudaError_t err = call;                                           \
    if (cudaSuccess != err) {                                         \
        fprintf (stderr, "Cuda error in file '%s' in line %i : %s.\n",\
                 __FILE__, __LINE__, cudaGetErrorString(err) );       \
        exit(EXIT_FAILURE);                                           \
    }                                                                 \
} while (0)
// Macro to catch CUDA errors in kernel launches
#define CHECK_LAUNCH_ERROR()                                          \
do {                                                                  \
    /* Check synchronous errors, i.e. pre-launch */                   \
    cudaError_t err = cudaGetLastError();                             \
    if (cudaSuccess != err) {                                         \
        fprintf (stderr, "Cuda error in file '%s' in line %i : %s.\n",\
                 __FILE__, __LINE__, cudaGetErrorString(err) );       \
        exit(EXIT_FAILURE);                                           \
    }                                                                 \
    /* Check asynchronous errors, i.e. kernel failed (ULF) */         \
    err = cudaThreadSynchronize();                                    \
    if (cudaSuccess != err) {                                         \
        fprintf (stderr, "Cuda error in file '%s' in line %i : %s.\n",\
                 __FILE__, __LINE__, cudaGetErrorString( err) );      \
        exit(EXIT_FAILURE);                                           \
    }                                                                 \
} while (0)

texture<unsigned char, 2, cudaReadModeNormalizedFloat> tex;

__global__ void kernel (int m, int n, float shift_x, float shift_y) 
{
    float val;
    for (int row = 0; row < m; row++) {
        for (int col = 0; col < n; col++) {
            val = 255.0 * tex2D (tex, col+0.5f+shift_x, row+0.5f+shift_y);
            printf ("%12.5f  ", val);
        }
        printf ("\n");
    }
}

int main (void)
{
    int m = 4; // height = #rows
    int n = 3; // width  = #columns
    size_t pitch, tex_ofs;
    unsigned char arr[4][3]= {{10,20,30},{40,50,60},{70,80,90},{100,110,120}};
    unsigned char *arr_d = 0;

    CUDA_SAFE_CALL(cudaMallocPitch((void**)&arr_d,&pitch,n*sizeof(*arr_d),m));
    CUDA_SAFE_CALL(cudaMemcpy2D(arr_d, pitch, arr, n*sizeof(arr[0][0]),
                                n*sizeof(arr[0][0]),m,cudaMemcpyHostToDevice));
    tex.normalized = false;
    tex.filterMode = cudaFilterModeLinear;
    CUDA_SAFE_CALL (cudaBindTexture2D (&tex_ofs, &tex, arr_d, &tex.channelDesc,
                                       n, m, pitch));
    if (tex_ofs !=0) {
        printf ("tex_ofs = %zu\n", tex_ofs);
        return EXIT_FAILURE;
    }
    printf ("reading array straight\n");
    kernel<<<1,1>>>(m, n, 0.0f, 0.0f);
    CHECK_LAUNCH_ERROR();
    CUDA_SAFE_CALL (cudaDeviceSynchronize());
    printf ("reading array shifted 0.5 in x-direction\n");
    kernel<<<1,1>>>(m, n, 0.5f, 0.0f);
    CHECK_LAUNCH_ERROR();
    CUDA_SAFE_CALL (cudaDeviceSynchronize());
    printf ("reading array shifted 0.5 in y-direction\n");
    kernel<<<1,1>>>(m, n, 0.0, -0.5f);
    CUDA_SAFE_CALL (cudaDeviceSynchronize());
    CUDA_SAFE_CALL (cudaFree (arr_d));
    return EXIT_SUCCESS;
}

moein.mfh · March 30, 2021, 10:50am

Thank you for posting this. I also found this example, which explain how to do it with a 2D texture object: https://stackoverflow.com/questions/54098747/cuda-how-to-create-2d-texture-object

I used this exmple and modified my code accordingly.

#include <cuda_runtime.h>
 #include "device_launch_parameters.h"
#include <stdio.h>
#include "cuda.h"

 //texture<float, 2, cudaReadModeElementType> tex;

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }

inline void gpuAssert(cudaError_t code, const char* file, int line, bool abort = true)
 {
if (code != cudaSuccess)
{
    fprintf(stderr, "GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
    if (abort) exit(code);
}
 }
  __global__ void kernel_ArrivalTimeCalculation(float* Device_ConvArrivalTime, int transmit, int size) {
int TID = threadIdx.y * blockDim.x + threadIdx.x;
int BlockOFFset = blockDim.x * blockDim.y * blockIdx.x;
int GID_RowBased = BlockOFFset + TID;
int RowOFFset = blockDim.x * blockDim.y * gridDim.x * blockIdx.y;
int GID = RowOFFset + BlockOFFset + TID;
Device_ConvArrivalTime[transmit * size + GID_RowBased] = (float)(transmit*20+GID/3)/10000;  // just to load the 
  Device_ConvArrivalTime
  }
 __global__ void kernel_ArrivalTimeCalculation_Show(float* Device_ConvArrivalTime, int transmit, int size, 
cudaTextureObject_t tex, int NumOFPixelX, int NumOFPixelZ, int NumOfSensor) {
int TID = threadIdx.y * blockDim.x + threadIdx.x;
int BlockOFFset = blockDim.x * blockDim.y * blockIdx.x;
int GID_RowBased = BlockOFFset + TID;
int  Px_man = (GID_RowBased) % (NumOFPixelX);
int Pz_man = (GID_RowBased) / (NumOFPixelX);
float value_Tex, value_Global;
value_Tex = tex2D<float>(tex, Px_man+0.5f, transmit*NumOFPixelZ+Pz_man + 0.5f);
value_Global = Device_ConvArrivalTime[transmit * size+ GID_RowBased];
printf("transmit: %d, value_Tex: %.6f, value_Global: %.6f \n", transmit,value_Tex, value_Global);
 }
int main() {
float* Device_ConvArrivalTime;  // Device pointer
cudaTextureObject_t tex;
int NumOfSensor = 96;
int NumOFPixelZ = 512;
int NumOFPixelX = 256;
int num_rows = NumOfSensor * NumOFPixelZ;
int num_cols = NumOFPixelX;
int devNo = 0;
cudaDeviceProp iProp;
cudaGetDeviceProperties(&iProp, devNo);
if (num_cols % iProp.texturePitchAlignment != 0) {
    printf("Improper number of columns. it should be a multiplication of  %lu \n", iProp.texturePitchAlignment);
}
size_t pitch;
gpuErrchk(cudaMallocPitch((void**)&Device_ConvArrivalTime, &pitch, num_cols * sizeof(float), num_rows));
struct cudaResourceDesc resDesc;
memset(&resDesc, 0, sizeof(resDesc));
resDesc.resType = cudaResourceTypePitch2D;
resDesc.res.pitch2D.devPtr = Device_ConvArrivalTime;
resDesc.res.pitch2D.width = num_cols;
resDesc.res.pitch2D.height = num_rows;
resDesc.res.pitch2D.desc = cudaCreateChannelDesc<float>();
resDesc.res.pitch2D.pitchInBytes = pitch;
//resDesc.res.pitch2D.pitchInBytes = num_cols * sizeof(float);
struct cudaTextureDesc texDesc;
memset(&texDesc, 0, sizeof(texDesc));
cudaCreateTextureObject(&tex, &resDesc, &texDesc, NULL);

dim3 block(1024, 1);
int size = NumOFPixelZ * NumOFPixelX;
dim3 grid((size / block.x), 1);
for (int transmit = 0; transmit < NumOfSensor; transmit++) { //NumOfSensor
    kernel_ArrivalTimeCalculation << <grid, block >> > (Device_ConvArrivalTime, transmit,size);
}
cudaDeviceSynchronize();

for (int transmit = 0; transmit < 2; transmit++) { //NumOfSensor
    kernel_ArrivalTimeCalculation_Show << <1, 20 >> > (Device_ConvArrivalTime, transmit, size, tex,NumOFPixelX,NumOFPixelZ, NumOfSensor);
}
cudaDeviceSynchronize();

gpuErrchk(cudaFree(Device_ConvArrivalTime));
cudaDeviceReset();

return 0;}

here is an image from the output:

So, for transmit=0, it works fine, but not for transmit=1. I double checked every indexing, but could not find which part is wrong. Any idea?

moein.mfh · March 30, 2021, 11:55am

I think the prblem is with cudaMallocPitch. I treat with Device_ConvArrivalTime as a 1D memory, and the texture memory is 2D. So, maybe Device_ConvArrivalTime is not correctly mapped to the texture memory which causes this error?

Topic		Replies	Views
Undefined tex1Dfetch in kernel CUDA Programming and Performance	7	2737	March 11, 2021
Repeated 1D interpolation with type promotion CUDA Programming and Performance	3	568	October 12, 2021
Using Textures CUDA Programming and Performance	10	21777	March 29, 2007
Using texture memory over iterations causes incorrect read/write of some lines CUDA Programming and Performance cuda	2	510	September 2, 2020
CUDA texture object with linear memory seems not to be updated when fetching CUDA Programming and Performance cuda	4	200	June 17, 2024
Texture Cache Startup Issue Simple Texture Cache Starter example CUDA Programming and Performance	8	4015	March 17, 2010
CUDA Texture Memory Example for Beginners CUDA Programming and Performance	6	4052	July 10, 2023
How to define texture properly CUDA Programming and Performance	10	6285	November 5, 2007
Simplest texture 2D examples CUDA Programming and Performance	11	10965	March 26, 2019
tex3D<float>... not able to to calculate from a code CUDA Programming and Performance	0	913	November 28, 2018

Output of 2D texture memory is zero

Related topics