I notice you have both nbData and nb_data being used, but maybe that is intentional.
Anyway, I don’t see a problem with the code you have posted:
$ cat t760.cu
#include <stdio.h>
#include <vector_types.h>
#define cudaCheckErrors(msg) \
do { \
cudaError_t __err = cudaGetLastError(); \
if (__err != cudaSuccess) { \
fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
msg, cudaGetErrorString(__err), \
__FILE__, __LINE__); \
fprintf(stderr, "*** FAILED - ABORTING\n"); \
exit(1); \
} \
} while (0)
texture<float4, cudaTextureType1D, cudaReadModeElementType> t_data;
__global__ void mykernel(){
int id = threadIdx.x+blockDim.x*blockIdx.x;
const float4 f0 = tex1Dfetch( t_data, id );
const float4 f1 = tex1Dfetch( t_data, id + 1 );
const float4 f2 = tex1Dfetch( t_data, id + 2 );
printf("f0.x = %f\n", f0.x);
printf("f0.y = %f\n", f0.y);
printf("f0.z = %f\n", f0.z);
printf("f0.w = %f\n", f0.w);
printf("f1.x = %f\n", f1.x);
printf("f1.y = %f\n", f1.y);
printf("f1.z = %f\n", f1.z);
printf("f1.w = %f\n", f1.w);
printf("f2.x = %f\n", f2.x);
printf("f2.y = %f\n", f2.y);
printf("f2.z = %f\n", f2.z);
printf("f2.w = %f\n", f2.w);
}
int main(){
float4 *devData;
float *hostData;
const int nbData = 1000;
hostData = (float *)malloc(nbData*sizeof(float4));
for (int i = 0; i < nbData*4; i++) hostData[i] = i;
cudaMalloc( &devData, nbData * sizeof(float4) ) ;
cudaMemcpy( devData, hostData, nbData * sizeof(float4), cudaMemcpyHostToDevice ) ;
cudaBindTexture( 0, t_data, devData, cudaCreateChannelDesc<float4>(), nbData * sizeof(float4) ) ;
mykernel<<<1,1>>>();
cudaDeviceSynchronize();
cudaCheckErrors("some error");
return 0;
}
$ nvcc -o t760 t760.cu
$ cuda-memcheck ./t760
========= CUDA-MEMCHECK
f0.x = 0.000000
f0.y = 1.000000
f0.z = 2.000000
f0.w = 3.000000
f1.x = 4.000000
f1.y = 5.000000
f1.z = 6.000000
f1.w = 7.000000
f2.x = 8.000000
f2.y = 9.000000
f2.z = 10.000000
f2.w = 11.000000
========= ERROR SUMMARY: 0 errors
$
In the future, I’d suggest providing a short, complete code (DEFINITION: a code that someone else could copy, paste, compile, and run, and see the issue, without having to add anything or change anything), just as I have done, to demonstrate your problem. Don’t dump your entire project here or give a link to a github repo. Reduce your code down to demonstrate just the issue. The reason for this is (at least) threefold:
-
If you desire help from someone else, then respect their time, and minimize the amount of work they have to do to help you. Why should I have the burden to create a complete code around the little snippets you have shown?
-
Frequently (as I suspect is the case here), when you distill your problem down into snippets that you haven’t actually tested yourself, you eliminate (or overlook) the actual issue in creating your snippets. This is quite understandable since you don’t actually know what the problem is, or you wouldn’t be asking for help. By providing a complete code that demonstrates the issue, that you have tested yourself, you minimize the possibility of falling into this trap. Furthermore, since you have usually demonstrated the issue in a single step, it makes the help process efficient; it’s frequently the case that in a single response, the person who is trying to help you can communicate the problem, and a proposed fix. Instead we now have this situation where we are going back and forth because your code actually works.
-
Very often in the process of creating this minimal reproducer code, you discover the actual problem yourself, and don’t need to ask for help.
I frequently just skip past questions where people have not demonstrated the effort to provide a complete code. In my view they are not serious about wanting help, and are not serious about respecting other people’s time. Occasionally, I break that pattern, as I am doing here.