I have readed the cuda_c_program_guid and found that there has a interesting writing:On integrated systems where device memory and host memory are physically the same, any copy between host and device memory is superfluous and mapped page-locked memory should be used instead.
So I have several questions: Is this means that i can overcome the transmission speed limited by the PCI Express? Because I want to transfer 1 MB data from CPU to GPU and then from GPU to CPU several thousand times. So the PCI Express speed is the limitation. Are there commercial products?
When you have integrated graphics, it uses your system memory (i.e. RAM) instead of having its own dedicated memory. As such, moving data from the host memory to device memory is of course superfluous, as you’ll just be pushing the data around in your RAM. Without getting into virtual memory, in integrated graphics, you can think of all your system memory (i.e. ~3.5 GB in a 32 bit system) being in the same address space of both the CPU and GPU, so there’s no distinction between device and host memory.
With that said, integrated graphics are no miracle solution. You’ll be cannibalizing your system memory, and they’re slow. But to answer your last question, yes there are a lot of commercial integrated graphics products.
Thanks for your reply!
I think that the commercial integrated graphics products i want is the high performance ones, such as the system integrate the Tesla and a high performance CPU.
Due to the much slower throughput of system memory compared to dedicated graphics memory (where you can use GDDR5 and wider buses), the only GPUs that share memory with the CPU are entry level devices, much slower than a Tesla. Although they have the advantage of not requiring the PCI-E bus to move data between the GPU and CPU, everything else about an integrated graphics device is worse. There is no Tesla that directly shares the memory bus of the CPU, because a CPU memory bus could not supply a Tesla with data fast enough to keep the CUDA cores busy.
And just to add to what Seibert said, the only “integrated systems” which are widely available and have zero copy and the other features you are interested in are just discontinued Apple products (like the “late 2010” Macbook Air and “mid 2010” Mac Mini). The other large family of products that were in the marketplace were Intel Diamonville Atom systems with the first versions of the “Ion” chipset, and mobile Geforce 9300/9400 chipsets for the Core 2 family of processors. The Later “Ion 2” chipset for PineView Atom processors and anything for Core i3/i5 processors are effectively only discrete GPUs which don’t share a common memory controller with the host CPU.