i want to know :
What is threadIdx. ?
i can determine threadIdx from. ?
Thanks…
i want to know :
What is threadIdx. ?
i can determine threadIdx from. ?
Thanks…
threadIdx is a structure where you can get a thread index. threadIdx is built-in.
if i use Geforce 9400 GT i want to know
ThreadIdx have a value = ?
BLOCK_SIZE = ?
blockIdx = ?
Thanks
sory im write english is bad
Hi Pingkung, i’am also realy newbe by cuda and i will say i also have too much Problem to understand how these indexe works. i am
not realy sure i understand all but i can already write some Kernel which work.
think your data as vector like V(x1,x2,x3,…,xn). It is posible to covert your matrix into vector form, so you will not have limitation working with vector.
dependent of the work we like cuda to do you shoud specify the configuration information for the Kernel. mykernel<<>>(parameter).
3)configuration information are: a)size of Grid (number of Blocks within a grid) and number of threads whitin a Block, so if you like the kernel to increment the element og a 64 elements vectors you can specify mykernel<<<1,64>> (parameter) or mykernel<<<8,8>>(parameter), (in first case you have one Block in Grid and this Block hat 64 thead).
4)now you need to work througth this vector, for this purpose you need (mykernel<<<8,8>>(parameter) ) to calculate index i so you cant write V[i]=V[i]+1; in your kernel.
take a lokk at this peace of code . i hope it can help you better understand this indixe.
// incrementArray.cu
#include <stdio.h>
#include <assert.h>
#include <cuda.h>
void incrementArrayOnHost(float *a, int N)
{
int i;
for (i=0; i < N; i++) a[i] = a[i]+1.f;
}
__global__ void incrementArrayOnDevice(float *a, int N)
{
int idx = blockIdx.x*blockDim.x + threadIdx.x;
if (idx<N)
a[idx] = a[idx]+1.f;
}
int main(void)
{
float *a_h, *b_h; // pointers to host memory
float *a_d; // pointer to device memory
int i, N = 10;
size_t size = N*sizeof(float);
// allocate arrays on host
a_h = (float *)malloc(size);
b_h = (float *)malloc(size);
// allocate array on device
cudaMalloc((void **) &a_d, size);
// initialization of host data
for (i=0; i<N; i++)
a_h[i] = (float)i;
// copy data from host to device
cudaMemcpy(a_d, a_h, sizeof(float)*N, cudaMemcpyHostToDevice);
// do calculation on host
printf("Result bevor computation on host\n");
for(i=0; i<N; i++)
printf("a_h[%d]=%f\n",i,a_h[i]);
incrementArrayOnHost(a_h, N);
printf("\n\n\n......computation on host\n");
printf("Result of computation on host\n");
for(i=0; i<N; i++)
printf("a_h[%d]=%f\n",i,a_h[i]);
// do calculation on device:
// Part 1 of 2. Compute execution configuration
int blockSize = 4;
int nBlocks = N/blockSize + (N%blockSize == 0?0:1);
// Part 2 of 2. Call incrementArrayOnDevice kernel
incrementArrayOnDevice <<< nBlocks, blockSize >>> (a_d, N);
// Retrieve result from device and store in b_h
cudaMemcpy(b_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
// check results
for (i=0; i<N; i++)
assert(a_h[i] == b_h[i]);
// cleanup
free(a_h);
free(b_h);
cudaFree(a_d);
scanf("-...%d",i);
return 0;
}
There is a sample code here:
http://www.hbeongpgpu.com/samplecudacode.htm
Hope this will help you understand better.
Thanks,
Heshsham
thank you.