I use the texture memory like this:
texture<int, 2, cudaReadModeElementType> texType;
texType.addressMode[0] = cudaAddressModeClamp;
texType.addressMode[1] = cudaAddressModeClamp;
texType.filterMode = cudaFilterModePoint;
texType.normalized = false;
cudaBindTextureToArray (texType, typeCuArray, channelDesc_V);
SiteType temp2 = (SiteType)tex2D(texType, index%widthTex, index/widthTex);
and then:
X[tdx] = temp2;
where X is shared memory:
extern shared char shared_memory;
int K=(int)shared_memory;
VecR B=(VecR)&K[ntd] ;
SiteType X=(SiteType)&B[ntd];
and the shared memory dimensions are passed to the kernel as a configuration input together with number of blocks and number of threads for each block:
size_t sharedDim=(3*sizeof(float)+sizeof(int)+sizeof(SiteType))*n_t
hreads;
because VecR is a structure of 3 floats. SiteType is an enum structure:
typedef enum { WATER_TYPE = 0,
CHOLINE_TYPE = 1,
PHOSPHATE_TYPE = 2,
GLYCEROL_TYPE = 3,
ESTER_TYPE = 4,
TAIL_TYPE = 5,
SOLUTE_TYPE = 99
} SiteType;
and typeCuArray:
cudaArray *typeCuArray;
cudaMallocArray (&typeCuArray, &channelDesc_V, widthTex, widthTex);
cudaMemcpyToArray(typeCuArray, 0, 0, typeDev, sizeof(SiteType)*(nSites), cudaMemcpyDeviceToDevice);
[b]and typeDev is a structure with nSites elements of type SiteType.
The results I obtain using the shared and texture memory like this, differ and have some suspicious values with respect to the version of the program where I use directly the typeDev structure for obtaining the values I need which resides in the device memory.[/b]
[b]Does anybody have any idea on why using shared and texture memory like this is wrong?
Thank you in advance for your answers.[/b]