Help please!

I tried to accelerate my program using CUDA. Routine listed below should calculate a cost of a route:

The same routine on the CPU called in a cycle fills y array with costs. On the GPU this routine fills array with -431602080 values. When I reduce number of “for” cycle steps to 20 sometimes I have correct costs in some elements of y, but in other elements I have -35659499650496332000 values and #QNAN0’s. Could somebody explain what is happening?

Frankly, I do not understand what your code is doing.
What does x and c arrays hold exactly - what is
x[p33+q] (for example index of target of q’th road from node p?)

This code goes through route and calculates its cost.

Yes, x[p*33+q] is the index of target where to go from node q on route p.

c contains costs of steps. c[r*33+s] is a cost of step from node r to node s.

So what if there are more than 33 nodes in your graph? Did you mean a cost of step from node r using route s?

There are exactly 33 nodes in this version.

No, I mean a cost of step from node r to node s using any route.

This routine is a part of population search algorithm for kind of vehicle routing problem. Actually, it is more complex but on GPU it does not work even in this simple version and I want to understand why.

There is a population of routes and each should be tested. Testing usually takes about 50% of total CPU time. I thought that it is a smart idea to use GPU to accelerate this tests.

Problem is solved, this topic can be deleted.