I am new to cuda code (using c programming) and I am trying to migrate the texture references to texture objects in one of our legacy code base.
I would need to access these texture objects in multiple kernels where there is too many indirection (kernel function calls) where I need to pass these objects as arguments unnecessarily, which would result in rewriting of lot of code. I have referred CUDA Programming: TEXTURE OBJECT IN CUDA | Bindless Texture in CUDA, for creating texture objects.
is there a way to declare these texture objects as global variables in the device code and update the data to the device memory from host by allocating new device memory and setting this memory to texture object(similar to texture reference where we can allocate new memory using cuMemAlloc and use cuTexRefSetAddress to update this, to the existing texture reference), is there a way to do something similar?
The general methodology to provide global scope device accessible variables is to use the __device__
decorator in front of a variable declaration (at global scope, i.e. module scope, not function scope).
Of course this sort of variable requires special handling. Accessing it from host code requires use of APIs like cudaMemcpyFromSymbol
and cudaMemcpyToSymbol
.
Did you try anything like that, perhaps with a texture object variable declared at __device__
global scope?
Something like like this which is a lightly modified version of what you see here, with the changes I indicated above:
# cat t175.cu
#include <stdio.h>
#include <stdint.h>
typedef uint8_t mt; // use an integer type
__device__ cudaTextureObject_t tex;
__global__ void kernel()
{
int x = threadIdx.x;
int y = threadIdx.y;
mt val = tex2D<mt>(tex, x, y);
printf("%d, ", val);
}
int main(int argc, char **argv)
{
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, 0);
printf("texturePitchAlignment: %lu\n", prop.texturePitchAlignment);
cudaTextureObject_t mytex;
const int num_rows = 4;
const int num_cols = prop.texturePitchAlignment*1; // should be able to use a different multiplier here
const int ts = num_cols*num_rows;
const int ds = ts*sizeof(mt);
mt dataIn[ts];
for (int i = 0; i < ts; i++) dataIn[i] = i;
mt* dataDev = 0;
cudaMalloc((void**)&dataDev, ds);
cudaMemcpy(dataDev, dataIn, ds, cudaMemcpyHostToDevice);
struct cudaResourceDesc resDesc;
memset(&resDesc, 0, sizeof(resDesc));
resDesc.resType = cudaResourceTypePitch2D;
resDesc.res.pitch2D.devPtr = dataDev;
resDesc.res.pitch2D.width = num_cols;
resDesc.res.pitch2D.height = num_rows;
resDesc.res.pitch2D.desc = cudaCreateChannelDesc<mt>();
resDesc.res.pitch2D.pitchInBytes = num_cols*sizeof(mt);
struct cudaTextureDesc texDesc;
memset(&texDesc, 0, sizeof(texDesc));
cudaCreateTextureObject(&mytex, &resDesc, &texDesc, NULL);
cudaMemcpyToSymbol(tex, &mytex, sizeof(cudaTextureObject_t));
dim3 threads(4, 4);
kernel<<<1, threads>>>();
cudaDeviceSynchronize();
printf("\n");
return 0;
}
# nvcc -o t175 t175.cu
# compute-sanitizer ./t175
========= COMPUTE-SANITIZER
texturePitchAlignment: 32
0, 1, 2, 3, 32, 33, 34, 35, 64, 65, 66, 67, 96, 97, 98, 99,
========= ERROR SUMMARY: 0 errors
#
Thank you, I will try this out. Looking for exactly something like this.
Hi @Robert_Crovella ,
I tried the above sample code by creating a sample project in vs2019 and even after including “texture_indirect_functions.h”, I get an error “E0020 identifier “tex2D” is undefined”. Am I missing something here, because I get the same error in my actual project as well(though in my actual code I tried to do tex1Dfetch in _device_ kernel function).
It’s a project or system setup issue. It’s not specific to the question you have asked - as you yourself have demonstrated.
Try compiling one of the sample projects that use tex2D, such as simple texture. If you can compile/run that project, then study differences between that project and yours. if you can’t compile that project, then your system setup is broken.
Also beware that newer VS mix “intellisense” errors with actual errors. Be sure to actually try to compile the code. Intellisense will often flag “errors” that don’t prevent you from compiling and running CUDA codes.
The “simple texture” did work correctly, some issue with the setup.
Thank you!
Is there a way to use driver API’s to do something similar to cudaMemcpyFromSymbol
and cudaMemcpyToSymbol
?
can “cuMemcpyHtoD” be used?
for cudaMemcpyToSymbol
, example:
CUdeviceptr constDevPtr;
cuModuleGetGlobal(&constDevPtr, NULL, module, device_symbol);
cuMemcpyHtoD(constDevPtr, &host_variable, sizeof(host_variable));