It indeed worked well. I check against the CPU. For this moment, I dont care about (NN % THREADS != 0). (Maybe Later).
My system guy need to check the timeout. Maybe it’s happened because the card serves also for grahics and also for the CUDA. I told you that I need to get new computer with two cards so we can check this issue about that system.
How can I get the register/smem usage for the kernel? I will be happy to post this information for you.
I did not run via profiler. What can I ge with the profiler and how can I run it?
If I change the k loop to run only NTHREADS / 2, I will get (time/2) but of course that the output is incorrect. What is your idea about the NTHREADS / 2 running?
There is not a relationship between my input data arrays. However, as you can see when we calculate the distances between the points, we calculate once (point1-point2)^2 and once (point2-point1)^2. So if we build the Matrix from all the lines that we