Newbie Problem with a small test-program ---

Hi,

I wrote a first cuda testing program and it doesn’t work. I hope someone can help me!

I wrote the following function to “initialize” an array in gpu-memory space:


global void VekIni(float3* Vektor)
{
int i = threadIdx.x;
Vektor[i].x = 1.1f;
Vektor[i].y = 1.2f;
Vektor[i].z = 1.3f;
}

And i start the function with:


float3* cudaVektor2;
cudaMalloc((void**)&cudaVektor2, N * sizeof(float3));
VekIni<<<1 , N>>>(cudaVektor2);

Then i copy the array back to host-memory space (array “Vektor2” is also initialized, “N” is some integer):

cudaMemcpy(Vektor2, cudaVektor2, N * sizeof(float3), cudaMemcpyDeviceToHost);

And try to printf the x-component it in the console:

for(i=0;i<N;i++)
{
printf(“Vektor2[%d].x = %f \n”, i, Vektor2[i].x);
}

But i only get “zeros” (“0”) and not the “1.1” that should appear by the printf call…
Thx for any help!

What is the value of N?

Thank you so much :)

… It was too high!

But how do I know, whats the maximum size i can choose for N? Is it always 512 oder 256 or so?

CUDA Programming Guide, Appendix A.1.1. For cards before the GTX 2xx series, it is 512, and for GTX 2xx cards it is 1024.