I’m working on a cuda project and got arround 20x optimization. But I still want to get more benefit from cuda :rolleyes:
Currently, my H->D bandwidth is only 1.9G (pinned). So I want to buy a better hardware to speed up it. I ever read a topic in this forum: someone get 5.7G (pinned) H->D bandwidth by Intel X58, but I know on Tesla, it’s easy to get speed arround 70xG (it’s too expensive to me). So my question is:
What’s the best solution for maximizing bandwidth except Tesla?
Why Tesla can get 70xG bandwidth, can someone give me a detailed technical article about this?
$450? Sound great, I have discussed with my boss, he preferred more powerful GPU. So maybe GTX295(s) will go on its way :rolleyes:
In my project, disk IO occupied more than 40%, GPU calculating occupied arround 10% and rest of them are hard to tune.
Why I preferred duo GPU than more powerfull single GPU? Please correct me if my understanding is wrong: CUDA suggests to use CPU-GPU pair to maximum performance, it means one GPU will hold almost all resources of one CPU. In my testing duo CPU + single GPU == single CPU + single GPU. BTW, I read certain topic in this forum, someone already used 9800GX2 to double the performance. So…
Using dual GPUs will only double performance if you are almost completely compute bound with very little I/O. My program, for instance, which otherwise is capable of running the entire calculation on the GPU requires a lot of communication between the two GPUS in a dual-GPU mode. Because of the extreme slowness of this communication, 2 GPUs are only 1.4x faster than 1.
If your app is 40% disk I/O, you will get a better return on your investment by putting that money into a RAID array or a fast SSD disk, or otherwise investing your time into optimizing the I/O portion of your code. Going dual GPU is not a magic bullet and actually opens up a lot of new difficulties in programming.
I already ordered 8 HDs (Raid0 x4 or x8) to improve my disk IO. My scenario is using two processes to double performance. Correct me if i’m wrong, in CUDA one GPU will hold all resources of one CPU, I can’t get any benefit on one GPU two processes on my machine. So I think duo GPU is the best solution. What do you think?
If only 10% of your time is spent on the GPU, then forget waiting for the GTX295 and start figuring out how to speed up all the other stuff. Two cards won’t help much. Two processes that only use 10% of the GPU each should share 1 GPU pretty well.
SSD will be much faster than RAID 0, if you don’t need a lot of storage. Of course, simply adding RAM will speed up disk I/O as well, as the OS will use that for caching.
I guess I should have clarified. I meant inter-GPU I/O in that post. If an algorithm is memory bound on each GPU, but requires very little inter-GPU communication then you will of course still get good scaling.
All the individual calculations scale nicely. The problem is that the overall iterative steps require that the data updated on each GPU needs to be communicated to the others before the next iteration. Since it is all iterative, one step after the other, there are no opportunities for overlapping memory transfers and computation. HOOMD on 2 GPUs is about 1.3 to 1.4 times faster than on a single GPU, assuming you have very good H <-> D transfer rates.