, NOWHERE in the code is the summation variable, in the example, dev_c[0], EXPLICITLY initialized to 0.
The OBVIOUS question is where is it being initialized to zero and where is such zero initialization covered in cuda documentation?
Indeed, when I run dot the first time, it works, as expected, but, if I run it a second time, on the same dev_c, the new answer is added to the old one. So, to get the correct answer the second time, I use cudaMemset(dev_c, 0, sizeof(int)). But, this is a time expensive operation. Please, show me a faster way, maybe, within the code of dot.
It needs to be initialized to zero. Training decks such as that one occasionally have oversights like that.
As you’ve already suggested, if you add something like:
cudaMemset(dev_c, 0, sizeof(int));
to the code on slide 60, prior to the kernel launch, that should address the oversight. I don’t recommend trying to do it in the kernel itself, as it introduces another race condition.
It’s not covered in documentation (e.g. docs.nvidia.com), because documentation for CUDA primarily addresses the language, not specific algorithms or implementations.
I think if you study any of the reduction examples in the CUDA sample codes, you will find appropriate initializations, as needed.
It is not “occasionally.” Show me one example of the dot product that currently exists on the web where there is explicit initialization of the summation variable to zero. This “oversight” isn’t restricted to NVIDA employees.
I suspect that cudaMalloc is doing the initialization to zero while NVIDIA pretends that it is a coincidence. Prove that it is not a coincidence by showing me one example where cuaMalloc produces nonzero values.
It also strikes me that if cudaMalloc is initializing data to zero, then your previously stated observation could not make sense:
After all, the exact same cudaMalloc operation is being invoked the first time you run the code, as well as the second, for dev_c
Whats different, of course, is the previous state of the memory. If the previous state of the memory matters, then cudaMalloc is not modifying the memory state.
Yes, I was confused here. When you said “the second time I ran dot” I was thinking you ran the program dot a second time. I assume now what you meant was you called the dot function twice in the same code.
Not correct. I’m not sure why you think that. However I can see that you’re not happy with my responses. So I’ll stop responding now. It’s OK if we disagree about things, you don’t have to take my word for anything.