Other than the CUDA manual, are there any publications (e.g. conferences/journals) that describe the CUDA architecture? I see some slides from a workshop at SC’06. Is there anything else?
Also, I have heard it said that CUDA will eventually support double precision. Is there some document etc. which describes that? Also, will new hardware be necessary for that (beyond the 8800 GTX?), or will the existing drivers be upgraded to support that.
This is so true. I gave a poster presentation at the American Physical Society for HOOMD and at least 1/2 the people that stopped by to talk had as their first question “But this is just single precision, right?” I would then spend the next 5 min trying to convince them (with quantitative data) that single precision is plenty good enough for Molecular Dynamics. Most of them remained unconvinced because in their minds, double precision is a magic bullet that solves everything.
I look forward to the double precision hardware so that I can implement HOOMD on it just to make this 1/2 of the physics community happy, even as I run my own research simulations in single precision ;)
Even knowing better, I made this mistake as well. An early version of my code used pseudo-double precision to accumulate a sum of a large number of floats. After 8 months, I finally tried single precision Kahan summation (thanks Simon Green for the suggestion to someone else!), and found it was enormously faster and worked just as well for my application. Being smart with single precision can be a lot more productive than being dumb with double precision. :)
mfactica, I apologize for being tedious, but are you saying that double precision will be natively supported in GeForce cards (as opposed to emulated)? If it is native, could you help me understand the distinction that Kirk is making between the ‘HPC space’ and the ‘consumer space’ (4th and 5th paragraphs on the second page of the interview):
“Consumers don’t need double precision,” he said. "It’s mainly going to be used in the HPC space to improve computational speed. As a result, we’re going to need to make the Tesla customers pay for that silicon on every chip until there is demand for it in the consumer space. That demand is kind of hard to imagine at the moment though.
“I canâ€™t predict the future because I don’t know, but I would imagine that double precision will be supported across all products.”
Just as a quick question. Did you implement this summation in a reduction-type kernel? I have a kernel which does a big reduction at the end, and my results could use some more accuracy, so I was thinking about using Kahan, but have the idea it is non-trivial to implement in reduction. Did not yet look into detail, it’s on the TODO list that just keeps growing faster & faster ;)
okay, if anyone is interested, I can give a small example of how the algorithm works. It is basically keeping a q-array next to your sum-array and doing a very-much-alike-Kahan summation step, taking two q’s in the first Kahan step instead of 1.