Weird behaviour with cudaMalloc

I have a weird problem, which probably has a logic explanation ( as it usually do ). I have a program which is based on randomness so if I used the same seed in my random generator I will get the same output.

So If I have a specific version of my code and run it a couple of times with the same seed I get the same output. Nothing strange so far, now to the strange part. I had a cudaMalloc left in my code for a variable that I don’t use anymore and is not sent into the kernel at all. If I comment out that cudaMalloc the result from my program is not the same anymore! My code is built based on that I divide a huge datafile into 32 chunks where one thread is responsible for processing each chunk ( I use 32 threads in other words ) so I can execute in parallell. Then after each thread is done some values are combined from the different subsets. I can tell from the output that the error is in the end so to speak. I print some stuff from my code and all output is equal except the last one which is totally different.The problem occurs in the last thread and probably has something to do with that is reads outside the input data. But what I don’t understand is how it can calculate it correct when I use cudaMalloc.

Anyone have had similar problems or can give me a hint what might causing this behaviour?

You probably have a memory addressing problem in your kernel somewhere so that the code is reading from out of bounds memory area. Having the “redundant” malloc there probably put some “safe” memory in the correct place so that the code works on reserved memory, or so that there is sufficient separation between input and output data in memory that one never overwrites the other,

It might be time to break out something like valgrind or ocelot and see what the code is doing with memory.

Thanks for the answer. Yes I think you are right about the memory problems. I’ll check out valgrind or ocelot and see if that can help me.