I’m a beginner to CUDA and I have read some of Rob farbers articles named “CUDA, Supercomputing for the Masses”
I’m currently trying to expand on his simple CUDA example which ca be found here:Cuda example
This simple example allocates an array on the host, copies it to the device, makes a simple calculation and copies it back. My program isn’t much more advanced, it only copies a total of 3 arrays to the device and the idea was to perform some operations between the three and then copy the result back.
My kernel:
__global__ void makeMatrix(int MatrixX, int MatrixY, int *n1, int *n2, int *Matrix, int mSize)
{
int idx = blockIdx.x*blockDim.x + threadIdx.x;
if(idx<10){
Matrix[idx] = 1;
n1[0] = 1;
}
The reason I call an array for “Matrix” is that I pretend it is a matrix until I can find a way to make a proper matrix. :)
The problem is that when I do any operations concerning n1 or n2 the results I get back are just rubbish as you can see here:
When commenting out the line concerning n1 I get the results one would expect from this very simple kernel:
__global__ void makeMatrix(int MatrixX, int MatrixY, int *n1, int *n2, int *Matrix, int mSize)
{
int idx = blockIdx.x*blockDim.x + threadIdx.x;
if(idx<10){
Matrix[idx] = 1;
//n1[0] = 1;
}
Result:
As far as I can understand I’m just making some basic mistake and I hope there is an easy way to fix it. Could it have something to do with memory allocation?