I just read an old paper from “Gregory Diamos” on Ocelot. The paper is titled “The design and implementation of Ocelot’s Dynamic Binary Translator from PTX to multi-core x86”.
Section III A, talks about how to extract PTX binary information. As I tried to verify the information, I realize the the way the binary is registered and stored in CUDA 2.3 (thats what I have on my linux box) is quite different from what the paper claims.
For example, there is only a single constructor that registers all kernels (as opposed to a constructor per kernel), the extern variable “fatBinary” no more exists and so on.
So, I assume that NVIDIA has chagned their internal binary representation and their APIs a bit in some CUDA version.
So, my questions to Greg (and to other knowledgeable ones) are:
0. Is my assumption right? I hope I am talking sense here… If no, dont read the next 2 questions.
- Is Ocelot particular to any CUDA version? OR Does it have compatibility issues with CUDA versions?
- Is Ocelot being updated, everytime NVIDIA decides to change their binary layout?