Undefined tex1Dfetch in kernel

moein.mfh · March 10, 2021, 10:40pm

Hi,
When i try to read from texture memory instead of global memory, my code compiles, but fails in the kernel. The host 1D array is called “Dir” and has a size of “ArrayByteSize_Dir = sizeof(int) * 12582912”.

Here is what i do:

i define the texture before main : texture <int, 1, cudaReadModeElementType> tex;
define the device memory for the 1D array i want to read in my kernel:
gpuErrchk(cudaMalloc((void**)&Device_Dir, ArrayByteSize_Dir));
gpuErrchk(cudaMemcpy(Device_Dir, Dir, ArrayByteSize_Dir, cudaMemcpyHostToDevice));
gpuErrchk(cudaBindTexture(NULL, tex, Device_Dir, ArrayByteSize_Dir));
and finally in my kernel, i read using: tex1Dfetch(tex, Index)

In my kernel, if i use Device_Dir[Index] it works fine (assuming that Device_Dir is passed to the kernel). So, is there anything I’m missing here?
I have a Geforce GTX 950M on my laptop if it matters.
I checked smaller Dir as well. it does not work!!

njuffa · March 10, 2021, 10:52pm

Tip: Instead of descriptions and snippets of code, post a minimal self-contained program that demonstrates the issue. This allows others to compile and run the code.

Note that words like “doesn’t work”, “fail”, “crash” etc. don’t convey enough relevant information. Failed how? Was there an error status or error message? If so, what was it? Was the result different from the expected result? If so, what was observed and what was expected?

moein.mfh · March 10, 2021, 11:18pm

here is the code. I did my best to extract what is needed from my main code. here you can find the Dir array: S_Dir.txt (24 MB) .

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "cuda.h"
#include <iostream>
#include <iomanip>
#include <stdio.h>
#include <cstdio>
using namespace std;
texture <int, 1, cudaReadModeElementType> tex;

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }

inline void gpuAssert(cudaError_t code, const char* file, int line, bool abort = true)
{
	if (code != cudaSuccess)
	{
		fprintf(stderr, "GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
		if (abort) exit(code);
	}
}
__global__ void  kernel() {

	printf(" (tex1Dfetch(tex, IndexingReceive): %d \n", (tex1Dfetch(tex, 100)));
}
int main (){

int* Dir;  
int* Device_Dir;
int Dir_Size = 512 * 256;  
int ArrayByteSize_Dir = sizeof(int) * (96* Dir_Size);|
cudaMallocHost((int**)&Dir, ArrayByteSize_Dir); // pinned memory

ifstream fin("S_Dir.txt");
	if (!fin) {
		cout << "Unable to open the database file S_Dir.txt" << endl;
		cout << "make sure the file exists" << endl;
		exit(1);
	}

	string line, coor;
	int NStart_receive,receive;
	for (int receive = 0; receive < 96; receive++) { 
		NStart_receive = receive * (512*256);
		std::getline(fin, line);
		istringstream iss(line);
		for (int pixel = 0; pixel < (512*256); pixel++) { 
			std::getline(iss, coor, ',');
			Dir[NStart_receive+ pixel] = atoi(coor.c_str());
		}
	}
	cout << "Loading the Dir ... Done. " << endl;
	fin.close();


gpuErrchk(cudaMalloc((void**)&Device_Dir, ArrayByteSize_Dir));
	gpuErrchk(cudaMemcpy(Device_Dir, Dir, ArrayByteSize_Dir, cudaMemcpyHostToDevice));
	gpuErrchk(cudaBindTexture(NULL, tex, Device_Dir, ArrayByteSize_Dir));


kernel << <1, 1 >> > ();
		gpuErrchk(cudaPeekAtLastError());

	gpuErrchk(cudaFree(Device_Dir));
	gpuErrchk(cudaUnbindTexture(tex));

	return 0;
}

i hope this helps.

The error is “an illegal memory access was encountered” and then “TUI_CUDA.exe (process 4812) exited with code 700.”

Robert_Crovella · March 10, 2021, 11:38pm

It may have been unsuccessful.

Please use the available tools to format your code. this thread has a description. (I have already fixed it.)
Error 700 means a kernel made an illegal memory access. That is unlikely to come about via texturing, due to the characteristics of the texture engine.
It may not be obvious to you, but the presence or absence of the error in your code should not depend on the data you are loading from the file. Therefore eliminating this will help for simplicity and focus.
As will be evident below, texture references are deprecated. It’s not the source of any problem here, but you may want to be aware of it for future code maintainability. The suggested solution is to use texture objects instead.

When I run this reduced version of your code, I get no errors:

$ cat t31.cu
#include <iostream>
#include <iomanip>
#include <stdio.h>
#include <cstdio>
using namespace std;
texture <int, 1, cudaReadModeElementType> tex;

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }

inline void gpuAssert(cudaError_t code, const char* file, int line, bool abort = true)
{
        if (code != cudaSuccess)
        {
                fprintf(stderr, "GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
                if (abort) exit(code);
        }
}
__global__ void  kernel() {

        printf(" (tex1Dfetch(tex, IndexingReceive): %d \n", (tex1Dfetch(tex, 100)));
}
int main (){

int* Dir;
int* Device_Dir;
int Dir_Size = 512 * 256;
int ArrayByteSize_Dir = sizeof(int) * (96* Dir_Size);
cudaMallocHost((int**)&Dir, ArrayByteSize_Dir); // pinned memory
for (int i = 0; i < 96*Dir_Size; i++) Dir[i] = i;
gpuErrchk(cudaMalloc((void**)&Device_Dir, ArrayByteSize_Dir));
        gpuErrchk(cudaMemcpy(Device_Dir, Dir, ArrayByteSize_Dir, cudaMemcpyHostToDevice));
        gpuErrchk(cudaBindTexture(NULL, tex, Device_Dir, ArrayByteSize_Dir));


kernel << <1, 1 >> > ();
                gpuErrchk(cudaPeekAtLastError());

        gpuErrchk(cudaFree(Device_Dir));
        gpuErrchk(cudaUnbindTexture(tex));

        return 0;
}
$ nvcc -o t31 t31.cu
t31.cu: In function ‘int main()’:
t31.cu:32:71: warning: ‘cudaError_t cudaBindTexture(size_t*, const texture<T, dim, readMode>&, const void*, size_t) [with T = int; int dim = 1; cudaTextureReadMode readMode = (cudaTextureReadMode)0; cudaError_t = cudaError; size_t = long unsigned int]’ is deprecated [-Wdeprecated-declarations]
  gpuErrchk(cudaBindTexture(NULL, tex, Device_Dir, ArrayByteSize_Dir));
                                                                       ^
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:1339:53: note: declared here
 static __CUDA_DEPRECATED __inline__ __host__ cudaError_t cudaBindTexture(
                                                     ^~~~~~~~~~~~~~~
t31.cu:39:34: warning: ‘cudaError_t cudaUnbindTexture(const texture<T, dim, readMode>&) [with T = int; int dim = 1; cudaTextureReadMode readMode = (cudaTextureReadMode)0; cudaError_t = cudaError]’ is deprecated [-Wdeprecated-declarations]
  gpuErrchk(cudaUnbindTexture(tex));
                                  ^
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:1661:53: note: declared here
 static __CUDA_DEPRECATED __inline__ __host__ cudaError_t cudaUnbindTexture(
                                                     ^~~~~~~~~~~~~~~~~
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h: In instantiation of ‘cudaError_t cudaBindTexture(size_t*, const texture<T, dim, readMode>&, const void*, size_t) [with T = int; int dim = 1; cudaTextureReadMode readMode = (cudaTextureReadMode)0; cudaError_t = cudaError; size_t = long unsigned int]’:
t31.cu:32:71:   required from here
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:1346:23: warning: ‘cudaError_t cudaBindTexture(size_t*, const texture<T, dim, readMode>&, const void*, const cudaChannelFormatDesc&, size_t) [with T = int; int dim = 1; cudaTextureReadMode readMode = (cudaTextureReadMode)0; cudaError_t = cudaError; size_t = long unsigned int]’ is deprecated [-Wdeprecated-declarations]
   return cudaBindTexture(offset, tex, devPtr, tex.channelDesc, size);
        ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:1293:53: note: declared here
 static __CUDA_DEPRECATED __inline__ __host__ cudaError_t cudaBindTexture(
                                                     ^~~~~~~~~~~~~~~
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h: In instantiation of ‘cudaError_t cudaUnbindTexture(const texture<T, dim, readMode>&) [with T = int; int dim = 1; cudaTextureReadMode readMode = (cudaTextureReadMode)0; cudaError_t = cudaError]’:
t31.cu:39:34:   required from here
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:1665:27: warning: ‘cudaError_t cudaUnbindTexture(const textureReference*)’ is deprecated [-Wdeprecated-declarations]
   return ::cudaUnbindTexture(&tex);
        ~~~~~~~~~~~~~~~~~~~^~~~~~
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime_api.h:8060:46: note: declared here
 extern __CUDA_DEPRECATED __host__ cudaError_t CUDARTAPI cudaUnbindTexture(const struct textureReference *texref);
                                              ^~~~~~~~~~~~~~~~~
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h: In instantiation of ‘cudaError_t cudaBindTexture(size_t*, const texture<T, dim, readMode>&, const void*, const cudaChannelFormatDesc&, size_t) [with T = int; int dim = 1; cudaTextureReadMode readMode = (cudaTextureReadMode)0; cudaError_t = cudaError; size_t = long unsigned int]’:
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:1346:23:   required from ‘cudaError_t cudaBindTexture(size_t*, const texture<T, dim, readMode>&, const void*, size_t) [with T = int; int dim = 1; cudaTextureReadMode readMode = (cudaTextureReadMode)0; cudaError_t = cudaError; size_t = long unsigned int]’
t31.cu:32:71:   required from here
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:1301:25: warning: ‘cudaError_t cudaBindTexture(size_t*, const textureReference*, const void*, const cudaChannelFormatDesc*, size_t)’ is deprecated [-Wdeprecated-declarations]
   return ::cudaBindTexture(offset, &tex, devPtr, &desc, size);
        ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime_api.h:7897:46: note: declared here
 extern __CUDA_DEPRECATED __host__ cudaError_t CUDARTAPI cudaBindTexture(size_t *offset, const struct textureReference *texref, const void *devPtr, const struct cudaChannelFormatDesc *desc, size_t size __dv(UINT_MAX));
                                              ^~~~~~~~~~~~~~~
$ cuda-memcheck ./t31
========= CUDA-MEMCHECK
 (tex1Dfetch(tex, IndexingReceive): 100
========= ERROR SUMMARY: 0 errors
$

Therefore I suspect the code you have shown is not what is actually causing the error. You might wish to use the method described here to start the debug.

moein.mfh · March 11, 2021, 10:19am

Yes, you are right. this code works fine. However, in my actual code, i have a Kernel which takes multiple Device pointers, all stored in Global memory. my kernel is like this:

cudaStream_t* streams = new cudaStream_t[96];  
for (int transmit = 0; transmit < 96; transmit++) {
	cudaStreamCreate(&streams[transmit]);
	NStart_Transmit = transmit * (96* 2048);
	cudaMemcpyAsync(&Device_RfData[NStart_Transmit], &RfData[NStart_Transmit], BYTES_PER_STREAM, cudaMemcpyHostToDevice, streams[transmit]);
	kernel_Reconstruction2 << <grid, block, 0, streams[transmit] >> > (Device_SystemSetup, Device_MediumZ, Device_MediumX, Device_TRansducerCorrZ, Device_TRansducerCorrX,
	&Device_RfData[NStart_Transmit], Dir_Size, ReconstructionSoundSpeed, Device_ReconstructedImage_GPU, transmit,NStart_Transmit, NumberOfPixels);
	gpuErrchk(cudaPeekAtLastError());
}

could the streaming be the problem?

moein.mfh · March 11, 2021, 12:25pm

I changed my code to texture object and i do not get the error anymore thanks to your suggestion. The program also works fine and the results are correct. However, the prcessing time increased compared to using the Global memory to read from.
Here is how i make the texture object:

// create texture object
gpuErrchk(cudaMalloc((void**)&Device_Dir, ArrayByteSize_Dir));
gpuErrchk(cudaMemcpy(Device_Dir, Dir, ArrayByteSize_Dir, udaMemcpyHostToDevice));
cudaResourceDesc resDesc;
memset(&resDesc, 0, sizeof(resDesc));
resDesc.resType = cudaResourceTypeLinear;
resDesc.res.linear.devPtr = Device_Dir;
resDesc.res.linear.desc.f = cudaChannelFormatKindSigned;
resDesc.res.linear.desc.x = 32; // bits per channel
resDesc.res.linear.sizeInBytes = ArrayByteSize_Dir;
cudaTextureDesc texDesc;
memset(&texDesc, 0, sizeof(texDesc));
texDesc.readMode = cudaReadModeElementType;
cudaTextureObject_t tex = 0;
cudaCreateTextureObject(&tex, &resDesc, &texDesc, NULL);

is there anything i’m missing here? was it not suppose to be faster compared to loading from global memory if the address of loadings are not aligned?
Dir is a 1D array containing 0 and 1. So, if there is also anyway to transfer bool, please also let me know.

Robert_Crovella · March 11, 2021, 2:02pm

I guess you are comparing working code to code that is doing illegal activity. I don’t find such comparisons sensible and wouldn’t be able to comment on them. If your code is working correctly, I doubt there is any problem with your texturing object handling.

I often do not respond to questions asking for help when they don’t provide a complete test case for me to work with.

Regarding a bool, AFAIK in most C++ implementations including CUDA, a bool is a byte. Therefore you could transfer a bool using uchar texture type, such as the example here.

Robert_Crovella · March 11, 2021, 2:24pm

I believe my suggestion about how to “start the debug” is the way to answer that question.

Topic		Replies	Views
Output of 2D texture memory is zero CUDA Programming and Performance	9	990	March 30, 2021
Using Textures CUDA Programming and Performance	10	21821	March 29, 2007
CUDA vs DX execution times DX GPGPU code --> CUDA = slower CUDA Programming and Performance	15	13319	January 30, 2008
Using texture memory over iterations causes incorrect read/write of some lines CUDA Programming and Performance cuda	2	514	September 2, 2020
Basic CUDA texture issue CUDA Programming and Performance	5	10202	June 25, 2009
Reading R8G8B8A8 texture using tex2D() causes strange result. CUDA Programming and Performance	27	2865	April 28, 2018
Texture Linear Filter doesn't work with uchar! normalizedFloat does not work with uchar, why CUDA Programming and Performance	7	3002	December 16, 2009
OpenGL interop: Reading from and writing to surface CUDA Programming and Performance	8	3337	December 16, 2015
CUDA Texture Memory Example for Beginners CUDA Programming and Performance	6	4667	July 10, 2023
Problems when reading from 2D Texture CUDA Programming and Performance	7	4494	March 5, 2011

Undefined tex1Dfetch in kernel

Related topics