Hey guys, i have the following problem.
I’m trying to address multiple textures through a 1D layered texture. The developer guide said that in order to use this kind of textures we need to use cudaMalloc3DArray, which i did. The problem is that somehow the layers are not returning the expected values, only the first two or tree are. As the 1D arrays are on the device i need to copy them to a cudaArray, which i suspect is not being well done.
The snippet code is:
// K is the number of layers
// width is the width of layers (number of elemets)
ShowError( cudaMalloc( (void **) &R_d, sizeof(float) * width * K ) );
ShowError( cudaMalloc3DArray(&R_d_copy, &channelDesc, make_cudaExtent(width, 0, K), cudaArrayLayered));
cudaMemcpy3DParms myparms = {0};
myparms.srcPtr = make_cudaPitchedPtr(R_d, width, width, 0);
myparms.dstArray = R_d_copy;
myparms.extent = make_cudaExtent(width, 0, K);
myparms.kind = cudaMemcpyDeviceToDevice;
ShowError( cudaMemcpy3D( &myparms));
// Bind to the texture...
ProjData.normalized = false;
ProjData.filterMode = cudaFilterModeLinear;
ProjData.addressMode[0] = cudaAddressModeWrap;
ShowError(cudaBindTextureToArray( ProjData, R_d_copy, channelDesc));
the code for the texture and how the kernel is accessing it:
texture<float, cudaTextureType1DLayered, cudaReadModeElementType> ProjData;
__global__ void CSSkernel(double Xp ,int NumProjs, int ProjWidth, int imgRows, int imgCols, float *Img)
{
int X = blockIdx.x * blockDim.x + threadIdx.x;
int Y = blockIdx.y * blockDim.y + threadIdx.y;
int i = 0, k = 0;
float rad = 0;
float t = 0;
float ImgVal = 0;
float RowsDivBy2 = (float)(imgRows)/ 2;
float ColsDivBy2 = (float)(imgCols)/ 2;
for(i = 0; i < NumProjs; i++)
{
rad = (float)(thetha_d[i] * pi / 180);
t = ( ((float)X - ColsDivBy2)*cos(rad) - ((float)Y - RowsDivBy2 )*sin(rad)) + Xp - 1;
ImgVal += (float)tex1DLayered(ProjData,t,i);
}
if((Y < imgRows && X < imgCols)){ Img[X*imgRows + Y] = (float)((pi/(float)NumProjs)*ImgVal); }
}
So i’m asking if anyone can see the problem!
Thanks in advance
Bruno Faria.