i just wanne discribe our problem i a few words:
we wanne make a kernel running parallel on cuda. the kernel still running very finde but much to slow. more slower than on our CPU. we use a geforce GTX 560Ti to run our kernel. so there schould be much potential to be faster than our AMD Dual-Core-CPU.
we use the following configuration to play with the max. numbers of threads our GPU is managing.
cuLaunchKernel( process, //Kernel to launch 1, //gridDimX - Width of grid in blocks 1, //gridDimY - Height of grid in blocks 1, //gridDimZ - Depth of grid in blocks 1, //blockDimX - X dimension of each thread block (z.B.: WORK_SIZE) // Total number of active threads 1, //blockDimY - Y dimension of each thread block 1, //blockDimZ - Z dimension of each thread block 0, //sharedMemBytes - Dynamic shared-memory size per thread block in bytes
so in our case we put up the blockDimx up to 512 to run our application in parallel.
in this forum there are often discriptions including warps, blocks and grids. what does this in our case means?
how many threads we are able to run in parallel with our Geforce GTX 560Ti?
we also take the time running our application on CPU (2s) and GPU (152s). so we think it’s not running parallel. could this problem cased with our configuration of gridDim and blockDim?
thanks for reply.