I’m in the process of developing a user mode block driver for Windows which works similarly to BeyondRAID (I wrote a paper on the tech long before BeyondRAID came around) and it combines RAID 0, RAID 5 and RAID 6 technologies to maximize reliability of hard drives at a block based level instead of making use of technologies like RAID-Z which requires a full file system implementation.
The goal of the technology is to make RAID a technology within the reach of consumers for home media servers and the such. I also wish to perform this function in software since then hard drive controllers can be mixed and matched as opposed to needing to purchase a high port count controller which has become prohibitively expensive over time.
For a home media server to be useful these days, it needs to run on extremely inexpensive hardware. An ION based NVidia motherboard would be an ideal system for a home RAID. The problem is, to achieve even reasonable performance, either an ASIC or a GPU based solution would be needed. Implementing this using an ASIC is easy enough and by selling a board through a chinese vendor like deal extreme would be easy enough. But in reality, it’s a far less than perfect solution.
Therefore, GPU is the way to go if it’s practical. I’ve done very limited GPU computing programming. Thus far, it has been limited to OpenGL based fragment shaders and the such, and that of course has been floating point based.
- Would it be practical and more importantly beneficial to code (in CUDA for example), an engine for offloading XOR block operations from the CPU effectively making the GPU a RAID coprocessor?
The CPU would handle all the block based processing, things like “what do I store where”, create a job, asynchronously pull the needed blocks from the hard drives needed to perform the XOR, then push the data to the graphics card several blocks at a time. Then when the job is done, the GPU would signal the CPU of the job completion and asynchronously write the data to the drive.
The performance of this would depend highly on whether the GPU is suitable for running large loops of integer operations on system memory. In the case of the ION based Atom boards, I imagine that the system memory and graphics memory is the same memory. Also, because the graphics memory needs to be accessed in a clock sensitive manor, the GPU is able to read it quite smoothly “DMA style”.
-
Can I pass a pointer to system memory to the GPU and operate directly on that?
-
Are there “CPU/GPU” tools for synchronization? Can I signal the CPU when a job is finished?
Thanks in advance,
- Darren