Can't quite figure out why test isn't working

Cattaneo · July 29, 2020, 8:28pm

I’m trying to get more familiar with CUDA by writing little test routines. Unfortunately I can’t seem to figure out where the problem is, I suspect it’s a very basic question, but I just can’t quite get a grasp on it.

The test program initiates an array 1000 long filled with random real numbers and then generates a random multiplier. Then the function I have written should go through the array and multiply all the contents by the multiplier one by one. Ideally each thread would handle 2 multiplications, which would ideally mean that 500 multiplications get done in one step. Unfortunately that doesn’t appear to be the case and I’m not sure why. Again, I apologise for the basic nature of this question but the documentation I have isn’t really helping. It compiles just fine and seems to run without a hitch and it even actually runs the subroutines as a print statement inside the subroutine will be called, so I’m guessing it’s an issue of my misunderstanding threads/blocks and using them as indexes for operations.
My code is below:

attributes(global) subroutine globalReferencePass(x, a)
implicit none
integer :: i, n
real :: x(:), a
n = size(x)
i=threadIdx%x + blockIdx%x * blockDim%x
x(i) = x(i) * a
end subroutine globalReferencePass

!initializing everything in main program
x_d1 = x1
a_d = randMulti
call globalReferencePass<<<500,2>>>(x_d1, a_d)
x1 = x_d1

As far as I can tell, going into x1 shows that it isn’t working quite correctly.

Cattaneo · July 31, 2020, 8:35pm

So I tried moving some things around unfortunately it’s still not working. I changed it to have 10 blocks of 50 threads each and tried with just 500 entries but it still doesn’t seem to want to do anything. It compiles just fine so I’m not sure where the problem lies.

Cattaneo · August 11, 2020, 7:24pm

I was able to solve the problem. There was a typo.