I have a weird problem, which probably has a logic explanation ( as it usually do ). I have a program which is based on randomness so if I used the same seed in my random generator I will get the same output.
So If I have a specific version of my code and run it a couple of times with the same seed I get the same output. Nothing strange so far, now to the strange part. I had a cudaMalloc left in my code for a variable that I don’t use anymore and is not sent into the kernel at all. If I comment out that cudaMalloc the result from my program is not the same anymore! My code is built based on that I divide a huge datafile into 32 chunks where one thread is responsible for processing each chunk ( I use 32 threads in other words ) so I can execute in parallell. Then after each thread is done some values are combined from the different subsets. I can tell from the output that the error is in the end so to speak. I print some stuff from my code and all output is equal except the last one which is totally different.The problem occurs in the last thread and probably has something to do with that is reads outside the input data. But what I don’t understand is how it can calculate it correct when I use cudaMalloc.
Anyone have had similar problems or can give me a hint what might causing this behaviour?