 # Help me!

i want to know :

i can determine threadIdx from. ?

Thanks…

if i use Geforce 9400 GT i want to know

ThreadIdx have a value = ?
BLOCK_SIZE = ?
blockIdx = ?

Thanks

sory im write english is bad

Hi Pingkung, i’am also realy newbe by cuda and i will say i also have too much Problem to understand how these indexe works. i am

not realy sure i understand all but i can already write some Kernel which work.

1. think your data as vector like V(x1,x2,x3,…,xn). It is posible to covert your matrix into vector form, so you will not have limitation working with vector.

2. dependent of the work we like cuda to do you shoud specify the configuration information for the Kernel. mykernel<<>>(parameter).

3)configuration information are: a)size of Grid (number of Blocks within a grid) and number of threads whitin a Block, so if you like the kernel to increment the element og a 64 elements vectors you can specify mykernel<<<1,64>> (parameter) or mykernel<<<8,8>>(parameter), (in first case you have one Block in Grid and this Block hat 64 thead).

4)now you need to work througth this vector, for this purpose you need (mykernel<<<8,8>>(parameter) ) to calculate index i so you cant write V[i]=V[i]+1; in your kernel.

1. i= _mul24(blockIdx.x,blockDim.x) + threadIdx.x; where all the element are buildin in the CUDA rutime API.

take a lokk at this peace of code . i hope it can help you better understand this indixe.

``````// incrementArray.cu

#include <stdio.h>

#include <assert.h>

#include <cuda.h>

void incrementArrayOnHost(float *a, int N)

{

int i;

for (i=0; i < N; i++) a[i] = a[i]+1.f;

}

__global__ void incrementArrayOnDevice(float *a, int N)

{

int idx = blockIdx.x*blockDim.x + threadIdx.x;

if (idx<N)

a[idx] = a[idx]+1.f;

}

int main(void)

{

float *a_h, *b_h;		   // pointers to host memory

float *a_d;				 // pointer to device memory

int i, N = 10;

size_t size = N*sizeof(float);

// allocate arrays on host

a_h = (float *)malloc(size);

b_h = (float *)malloc(size);

// allocate array on device

cudaMalloc((void **) &a_d, size);

// initialization of host data

for (i=0; i<N; i++)

a_h[i] = (float)i;

// copy data from host to device

cudaMemcpy(a_d, a_h, sizeof(float)*N, cudaMemcpyHostToDevice);

// do calculation on host

printf("Result bevor computation on host\n");

for(i=0; i<N; i++)

printf("a_h[%d]=%f\n",i,a_h[i]);

incrementArrayOnHost(a_h, N);

printf("\n\n\n......computation on host\n");

printf("Result of computation on host\n");

for(i=0; i<N; i++)

printf("a_h[%d]=%f\n",i,a_h[i]);

// do calculation on device:

// Part 1 of 2. Compute execution configuration

int blockSize = 4;

int nBlocks = N/blockSize + (N%blockSize == 0?0:1);

// Part 2 of 2. Call incrementArrayOnDevice kernel

incrementArrayOnDevice <<< nBlocks, blockSize >>> (a_d, N);

// Retrieve result from device and store in b_h

cudaMemcpy(b_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);

// check results

for (i=0; i<N; i++)

assert(a_h[i] == b_h[i]);

// cleanup

free(a_h);

free(b_h);

cudaFree(a_d);

scanf("-...%d",i);

return 0;

}
``````

There is a sample code here:

http://www.hbeongpgpu.com/samplecudacode.htm