first of all im very new to cuda and very familiar with c/c++. my main goal is to be able to use cuda to implement real time stereo-matching of images by running many threads at once. However since im new to cuda i thought it would be best to start small and show myself how CUDA is used and run a few speed tests. Im having trouble understand this code sample from Dr. Dobbs journal.
// incrementArray.cu
#include <stdio.h>
#include <windows.h>
#include <assert.h>
#include <cuda.h>
#include “stopwatch.hpp”
//handled by the gpu
global void incrementArrayOnDevice(double a, double N)
{
int idx = blockIdx.xblockDim.x + threadIdx.x;
if (idx<N) a[idx] += 1;
}
int main(void)
{
Stopwatch timer;
double i = 0;
double numDoubles = 256 * 8388608;
double* cuda_data;
double* result_data;
result_data = (double*)malloc(sizeof(double)*numDoubles);
cudaMalloc((void**)&cuda_data, sizeof(double)*numDoubles);
// do calculation using cuda:
// Part 1 of 2. Compute execution configuration
double numThreadsPerBlock = 256;
double numBlocks = numDoubles / numThreadsPerBlock;
// Part 2 of 2. Call incrementArrayOnDevice kernel
dim3 dimGrid(numBlocks);
dim3 dimBlock(numThreadsPerBlock);
timer.start();
incrementArrayOnDevice <<< dimGrid, dimBlock >>> (cuda_data, numDoubles);
// Retrieve result from device and store in b_h
cudaMemcpy(result_data, cuda_data, sizeof(double)*numDoubles, cudaMemcpyDeviceToHost);
timer.stop();
printf("Time to calculate using cuda: %i\n", timer.getTime());
timer.reset();
// cleanup
delete [] result_data; cudaFree(cuda_data);
system("pause");
}
can somebody please explain to me what the kernel is actually doing once its called. How many threads are being run at the same time once executed?
Also if i ran through a for loop numDoubles time on the host, is it going to be much slower than the kernel call to cuda?
I need to somehow show myself that CUDA is performing much faster than the CPU would be, but im doing non-trivial work so i cant tell much a difference of whats being done.
I hope i explained my situation well, i hope someone can help me.
Thanks very much in advance