I’ve read about the ILP capabilities of Kepler and Maxwell (Volkov’s talk for example http://www.cs.berkeley.edu/~volkov/volkov10-GTC.pdf).
But I can’t find any references for ILP in the documentation or the architecture white papers.
As far as I can see, each CUDA core has integer and float ALU (but only one of them can be active at a time) and can do (F)MAD, but thats it. For ILP however, it would need more than that - at least multiple ALUs in each core should be active at the same time, but I can’t find clue for such hardware in those cores (and yet obviously ILP is something that is happening on Kepler and Maxwell). FMAD is kind of ILP, but is that the all ILP that is possible ?
Any explanations regarding that would be greatly appreciated.