Undefined tex1Dfetch in kernel

Hi,
When i try to read from texture memory instead of global memory, my code compiles, but fails in the kernel. The host 1D array is called “Dir” and has a size of “ArrayByteSize_Dir = sizeof(int) * 12582912”.

Here is what i do:

  • i define the texture before main : texture <int, 1, cudaReadModeElementType> tex;

  • define the device memory for the 1D array i want to read in my kernel:
    gpuErrchk(cudaMalloc((void**)&Device_Dir, ArrayByteSize_Dir));
    gpuErrchk(cudaMemcpy(Device_Dir, Dir, ArrayByteSize_Dir, cudaMemcpyHostToDevice));
    gpuErrchk(cudaBindTexture(NULL, tex, Device_Dir, ArrayByteSize_Dir));

  • and finally in my kernel, i read using: tex1Dfetch(tex, Index)

In my kernel, if i use Device_Dir[Index] it works fine (assuming that Device_Dir is passed to the kernel). So, is there anything I’m missing here?
I have a Geforce GTX 950M on my laptop if it matters.
I checked smaller Dir as well. it does not work!!

Tip: Instead of descriptions and snippets of code, post a minimal self-contained program that demonstrates the issue. This allows others to compile and run the code.

Note that words like “doesn’t work”, “fail”, “crash” etc. don’t convey enough relevant information. Failed how? Was there an error status or error message? If so, what was it? Was the result different from the expected result? If so, what was observed and what was expected?

here is the code. I did my best to extract what is needed from my main code. here you can find the Dir array: S_Dir.txt (24 MB) .

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "cuda.h"
#include <iostream>
#include <iomanip>
#include <stdio.h>
#include <cstdio>
using namespace std;
texture <int, 1, cudaReadModeElementType> tex;

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }

inline void gpuAssert(cudaError_t code, const char* file, int line, bool abort = true)
{
	if (code != cudaSuccess)
	{
		fprintf(stderr, "GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
		if (abort) exit(code);
	}
}
__global__ void  kernel() {

	printf(" (tex1Dfetch(tex, IndexingReceive): %d \n", (tex1Dfetch(tex, 100)));
}
int main (){

int* Dir;  
int* Device_Dir;
int Dir_Size = 512 * 256;  
int ArrayByteSize_Dir = sizeof(int) * (96* Dir_Size);|
cudaMallocHost((int**)&Dir, ArrayByteSize_Dir); // pinned memory

ifstream fin("S_Dir.txt");
	if (!fin) {
		cout << "Unable to open the database file S_Dir.txt" << endl;
		cout << "make sure the file exists" << endl;
		exit(1);
	}

	string line, coor;
	int NStart_receive,receive;
	for (int receive = 0; receive < 96; receive++) { 
		NStart_receive = receive * (512*256);
		std::getline(fin, line);
		istringstream iss(line);
		for (int pixel = 0; pixel < (512*256); pixel++) { 
			std::getline(iss, coor, ',');
			Dir[NStart_receive+ pixel] = atoi(coor.c_str());
		}
	}
	cout << "Loading the Dir ... Done. " << endl;
	fin.close();


gpuErrchk(cudaMalloc((void**)&Device_Dir, ArrayByteSize_Dir));
	gpuErrchk(cudaMemcpy(Device_Dir, Dir, ArrayByteSize_Dir, cudaMemcpyHostToDevice));
	gpuErrchk(cudaBindTexture(NULL, tex, Device_Dir, ArrayByteSize_Dir));


kernel << <1, 1 >> > ();
		gpuErrchk(cudaPeekAtLastError());

	gpuErrchk(cudaFree(Device_Dir));
	gpuErrchk(cudaUnbindTexture(tex));

	return 0;
}

i hope this helps.

The error is “an illegal memory access was encountered” and then “TUI_CUDA.exe (process 4812) exited with code 700.”

It may have been unsuccessful.

  1. Please use the available tools to format your code. this thread has a description. (I have already fixed it.)
  2. Error 700 means a kernel made an illegal memory access. That is unlikely to come about via texturing, due to the characteristics of the texture engine.
  3. It may not be obvious to you, but the presence or absence of the error in your code should not depend on the data you are loading from the file. Therefore eliminating this will help for simplicity and focus.
  4. As will be evident below, texture references are deprecated. It’s not the source of any problem here, but you may want to be aware of it for future code maintainability. The suggested solution is to use texture objects instead.

When I run this reduced version of your code, I get no errors:

$ cat t31.cu
#include <iostream>
#include <iomanip>
#include <stdio.h>
#include <cstdio>
using namespace std;
texture <int, 1, cudaReadModeElementType> tex;

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }

inline void gpuAssert(cudaError_t code, const char* file, int line, bool abort = true)
{
        if (code != cudaSuccess)
        {
                fprintf(stderr, "GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
                if (abort) exit(code);
        }
}
__global__ void  kernel() {

        printf(" (tex1Dfetch(tex, IndexingReceive): %d \n", (tex1Dfetch(tex, 100)));
}
int main (){

int* Dir;
int* Device_Dir;
int Dir_Size = 512 * 256;
int ArrayByteSize_Dir = sizeof(int) * (96* Dir_Size);
cudaMallocHost((int**)&Dir, ArrayByteSize_Dir); // pinned memory
for (int i = 0; i < 96*Dir_Size; i++) Dir[i] = i;
gpuErrchk(cudaMalloc((void**)&Device_Dir, ArrayByteSize_Dir));
        gpuErrchk(cudaMemcpy(Device_Dir, Dir, ArrayByteSize_Dir, cudaMemcpyHostToDevice));
        gpuErrchk(cudaBindTexture(NULL, tex, Device_Dir, ArrayByteSize_Dir));


kernel << <1, 1 >> > ();
                gpuErrchk(cudaPeekAtLastError());

        gpuErrchk(cudaFree(Device_Dir));
        gpuErrchk(cudaUnbindTexture(tex));

        return 0;
}
$ nvcc -o t31 t31.cu
t31.cu: In function ‘int main()’:
t31.cu:32:71: warning: ‘cudaError_t cudaBindTexture(size_t*, const texture<T, dim, readMode>&, const void*, size_t) [with T = int; int dim = 1; cudaTextureReadMode readMode = (cudaTextureReadMode)0; cudaError_t = cudaError; size_t = long unsigned int]’ is deprecated [-Wdeprecated-declarations]
  gpuErrchk(cudaBindTexture(NULL, tex, Device_Dir, ArrayByteSize_Dir));
                                                                       ^
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:1339:53: note: declared here
 static __CUDA_DEPRECATED __inline__ __host__ cudaError_t cudaBindTexture(
                                                     ^~~~~~~~~~~~~~~
t31.cu:39:34: warning: ‘cudaError_t cudaUnbindTexture(const texture<T, dim, readMode>&) [with T = int; int dim = 1; cudaTextureReadMode readMode = (cudaTextureReadMode)0; cudaError_t = cudaError]’ is deprecated [-Wdeprecated-declarations]
  gpuErrchk(cudaUnbindTexture(tex));
                                  ^
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:1661:53: note: declared here
 static __CUDA_DEPRECATED __inline__ __host__ cudaError_t cudaUnbindTexture(
                                                     ^~~~~~~~~~~~~~~~~
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h: In instantiation of ‘cudaError_t cudaBindTexture(size_t*, const texture<T, dim, readMode>&, const void*, size_t) [with T = int; int dim = 1; cudaTextureReadMode readMode = (cudaTextureReadMode)0; cudaError_t = cudaError; size_t = long unsigned int]’:
t31.cu:32:71:   required from here
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:1346:23: warning: ‘cudaError_t cudaBindTexture(size_t*, const texture<T, dim, readMode>&, const void*, const cudaChannelFormatDesc&, size_t) [with T = int; int dim = 1; cudaTextureReadMode readMode = (cudaTextureReadMode)0; cudaError_t = cudaError; size_t = long unsigned int]’ is deprecated [-Wdeprecated-declarations]
   return cudaBindTexture(offset, tex, devPtr, tex.channelDesc, size);
        ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:1293:53: note: declared here
 static __CUDA_DEPRECATED __inline__ __host__ cudaError_t cudaBindTexture(
                                                     ^~~~~~~~~~~~~~~
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h: In instantiation of ‘cudaError_t cudaUnbindTexture(const texture<T, dim, readMode>&) [with T = int; int dim = 1; cudaTextureReadMode readMode = (cudaTextureReadMode)0; cudaError_t = cudaError]’:
t31.cu:39:34:   required from here
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:1665:27: warning: ‘cudaError_t cudaUnbindTexture(const textureReference*)’ is deprecated [-Wdeprecated-declarations]
   return ::cudaUnbindTexture(&tex);
        ~~~~~~~~~~~~~~~~~~~^~~~~~
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime_api.h:8060:46: note: declared here
 extern __CUDA_DEPRECATED __host__ cudaError_t CUDARTAPI cudaUnbindTexture(const struct textureReference *texref);
                                              ^~~~~~~~~~~~~~~~~
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h: In instantiation of ‘cudaError_t cudaBindTexture(size_t*, const texture<T, dim, readMode>&, const void*, const cudaChannelFormatDesc&, size_t) [with T = int; int dim = 1; cudaTextureReadMode readMode = (cudaTextureReadMode)0; cudaError_t = cudaError; size_t = long unsigned int]’:
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:1346:23:   required from ‘cudaError_t cudaBindTexture(size_t*, const texture<T, dim, readMode>&, const void*, size_t) [with T = int; int dim = 1; cudaTextureReadMode readMode = (cudaTextureReadMode)0; cudaError_t = cudaError; size_t = long unsigned int]’
t31.cu:32:71:   required from here
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:1301:25: warning: ‘cudaError_t cudaBindTexture(size_t*, const textureReference*, const void*, const cudaChannelFormatDesc*, size_t)’ is deprecated [-Wdeprecated-declarations]
   return ::cudaBindTexture(offset, &tex, devPtr, &desc, size);
        ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime_api.h:7897:46: note: declared here
 extern __CUDA_DEPRECATED __host__ cudaError_t CUDARTAPI cudaBindTexture(size_t *offset, const struct textureReference *texref, const void *devPtr, const struct cudaChannelFormatDesc *desc, size_t size __dv(UINT_MAX));
                                              ^~~~~~~~~~~~~~~
$ cuda-memcheck ./t31
========= CUDA-MEMCHECK
 (tex1Dfetch(tex, IndexingReceive): 100
========= ERROR SUMMARY: 0 errors
$

Therefore I suspect the code you have shown is not what is actually causing the error. You might wish to use the method described here to start the debug.

Yes, you are right. this code works fine. However, in my actual code, i have a Kernel which takes multiple Device pointers, all stored in Global memory. my kernel is like this:

cudaStream_t* streams = new cudaStream_t[96];  
for (int transmit = 0; transmit < 96; transmit++) {
	cudaStreamCreate(&streams[transmit]);
	NStart_Transmit = transmit * (96* 2048);
	cudaMemcpyAsync(&Device_RfData[NStart_Transmit], &RfData[NStart_Transmit], BYTES_PER_STREAM, cudaMemcpyHostToDevice, streams[transmit]);
	kernel_Reconstruction2 << <grid, block, 0, streams[transmit] >> > (Device_SystemSetup, Device_MediumZ, Device_MediumX, Device_TRansducerCorrZ, Device_TRansducerCorrX,
	&Device_RfData[NStart_Transmit], Dir_Size, ReconstructionSoundSpeed, Device_ReconstructedImage_GPU, transmit,NStart_Transmit, NumberOfPixels);
	gpuErrchk(cudaPeekAtLastError());
}

could the streaming be the problem?

I changed my code to texture object and i do not get the error anymore thanks to your suggestion. The program also works fine and the results are correct. However, the prcessing time increased compared to using the Global memory to read from.
Here is how i make the texture object:

// create texture object
gpuErrchk(cudaMalloc((void**)&Device_Dir, ArrayByteSize_Dir));
gpuErrchk(cudaMemcpy(Device_Dir, Dir, ArrayByteSize_Dir, udaMemcpyHostToDevice));
cudaResourceDesc resDesc;
memset(&resDesc, 0, sizeof(resDesc));
resDesc.resType = cudaResourceTypeLinear;
resDesc.res.linear.devPtr = Device_Dir;
resDesc.res.linear.desc.f = cudaChannelFormatKindSigned;
resDesc.res.linear.desc.x = 32; // bits per channel
resDesc.res.linear.sizeInBytes = ArrayByteSize_Dir;
cudaTextureDesc texDesc;
memset(&texDesc, 0, sizeof(texDesc));
texDesc.readMode = cudaReadModeElementType;
cudaTextureObject_t tex = 0;
cudaCreateTextureObject(&tex, &resDesc, &texDesc, NULL);

is there anything i’m missing here? was it not suppose to be faster compared to loading from global memory if the address of loadings are not aligned?
Dir is a 1D array containing 0 and 1. So, if there is also anyway to transfer bool, please also let me know.

I guess you are comparing working code to code that is doing illegal activity. I don’t find such comparisons sensible and wouldn’t be able to comment on them. If your code is working correctly, I doubt there is any problem with your texturing object handling.

I often do not respond to questions asking for help when they don’t provide a complete test case for me to work with.

Regarding a bool, AFAIK in most C++ implementations including CUDA, a bool is a byte. Therefore you could transfer a bool using uchar texture type, such as the example here.

I believe my suggestion about how to “start the debug” is the way to answer that question.