Hi everyone. I’m new to the boards. I’ve been looking through old threads for similar problems to my own but I haven’t found anything that seems to directly relate to the problems I’m having. Sorry if this has been covered before and I missed it.
I’m currently working on a small program that finds the distance between vectors in 100-dimensional space using a simple Euclidian distance algorithm (i.e. the square-root of the sum of squared differences). This is being done with a very large array of values (1,000,000 * 100 floats at this point). The strange behaviour I’m getting is that when I try to run my program with 1,000,000 vectors, when I use cudaMemcpy to move the results from the device to the host and output the results to the scren, everything is 0. When I try it with 100,000 vectors the results are correct when compared with the CPU code for about 30,000 entries but then begin to repeat the same value over and over (i.e. if the results were correct for 30,000 values the 30,001+ values are the same as the 30,000th value).
The GPU has 500 MB of memory and the above configuration would result in a minimum of 400MB of memory being used so I’m thinking that I may be trying to load too much into the GPU’s memory.
My questions are three fold:
- Has anyone else experienced the problem of having all zeroes or a repeating value being output when using very large amounts of memory on the GPU?
- Is it possible that using too much memory could be responsible for the behaviour I’m seeing?
- I plan to scale this up to 1,000,000,000 vectors in the future which will definitely be too large for the GPU’s memory. Am I right to think that I will need to make multiple kernel calls to load the data in and out?
Thanks in advance!