Hello,

I wrote the following (very simple) kernel:

**global** void MyKernel(float *out)

{

//<<<gridDim.x , blockDim.x>>

```
// Orientation
int id = blockIdx.x * blockDim.x + threadIdx.x;
// Compute results
out[id] = id;
```

}

#define N_VECTORS 5

#define N_SAMPLES 7

Then I run the kernel with:

cudaMalloc((void**)&devOut, N_VECTORS * N_SAMPLES * sizeof(float));

MyKernel<<<N_VECTORS,N_SAMPLES>>>(devOut);

I also ran it with:

MyKernel<<<N_SAMPLES,N_VECTORS>>>(devOut);

Can you tell why in both cases I’m getting the same output ?

Best regards,

Z.V