I would like to use CUDA and NVIDIA cards to build a research prototype about efficient data transfers mechanisms between CPUs and accelerators (in this case NVIDIA cards).
My very first step is to map the video card memory in the Linux kernel-space or in the user-level address space.
I have a small Linux module that gets the BAR areas from the pci_dev structure for the NVIDIA card. I’ve noticed that there are three different configured areas, being one of them 255MB large. However, my test card is supposed to include 768MB of memory (First guess: there is some kind of register to select which memory area in the card is mapped to the PCI range). Anyway, I can map that PCI memory area and read/write whatever I want from/to it. For instance, I can have an application writing an signature to different addresses (using cudaMalloc and cudaMemcpy) and reading that signature from the PCI memory I’ve mapped to the kernel address space (Problem: it only works when cudaMalloc() returns a low-address; Guess: if the application requests a large piece of memory, the NVIDIA driver returns device memory out of the mappeable range). Another really cool experiment is to write a piece of text to the PCI mapped memory (say the first four sentences from Don Quixote) and reading them using cudaMemcpy. In this case, the application has to call to cudaMalloc before doing the DMA transfer. I’ve had so much fun doing this :-).
A couple of really ugly hacks come to my mind to allow me to avoid calls to cudaMemcpy, but it would only work for certain benchmarks. Without thinking too much, I’ve also thought about some ways to know how to switch memory range that is being mapped to the PCI bus (if this is possible). I am completely sure that I can have so much fun out of this… even by 2012 I might be able to have something usable. Anyway, I would prefer NVIDIA guys giving me some help.
Is there any Linux kernel-level API? (Ok, I’ve done and objdump of the nvidia.ko, so I already know the answer is no). Would it be possible to have a linux kernel-level API?
It would be nice to be able to: (1) Allocate video memory and get both. PCI address and device address. If there is a register to switch the video memory mapped to the PCI bus, I would also need some sort of call to switch from one memory range to a different one. (2) Call a given kernel in the video card. Right now I do not really need this feature, but as soon as I move to GPU kernel scheduling (I’ve already have a user-level ugly hack to do this) I will need calling GPU kernels from Linux kernel - level.