Can what I need to do be done easily with CUDA?

I have a project which I have been working on for years using P3 (Processing language) and I am reaching a bottleneck of transferring data from RAM to the GPU. I want to re-write my code in order for it to be able to scale up and use something similar to the Tesla. The basis of the application is that I have a large 3D dataset that I want to represent in real time with a 3D grid of polyhedrons. I am starting with cubes due to their simplicity and want to work up to higher sided polyhedrons. I have a grid of 100x100x100 cubes. Each cube has 24 possible positions. All of the cubes are simply visual representations of the data in each cell, in this case 1-24. An example of the output from my prototype is [url]https://drive.google.com/open?id=0B2MOWSYunSCFSWVoTF9pS0JWTzg[/url].

It is my theory, that since there is only one cube, oriented 24 different ways and stored in a large array 1Kx1Kx1K, that all of this should be something simple for CUDA and the GPU to handle without me having to have the CPU recalculate everything and push it across to the GPU each time.

Basic flow is this:

  1. Load large dataset into RAM
  2. Transfer 3D Visual Orientations (VO) dataset to GPU?

3G. Have GPU draw the same cube 1M+ times according to VO dataset.
4G. Be able to rotate/zoom/fly through model and select any individual cube.

3C. Have CPU do calculation on large dataset in parallel to GPU doing its work.
4C. Upload changes to the 3D VO dataset to the GPU as needed.
5C. Receive cell selection from GPU and display large dataset metadata about that cell

I am hoping that this is something that I could easily do.
If anybody has any insight or direction it would be appreciated.
Does anybody have any code samples or examples of anything even remotely close to this?
Are their libraries that I should be look at, for example, is there a camera library that allows a user to view objects created and rotate around them, etc?

If all of this looks promising, I will start going through all of the tutorials before asking too many more questions, but I do not want to waste my time if not even remotely possible and do not want to do it if I am going to have to write all of the code from scratch.

THANKS