Occupancy estimation... ..at runtime?


Working on my own project,
I was wondering… Would it make any sense to code the occupancy estimation formula from the XLS to search for the best block sizes at runtime, before running my kernel ?
I’m currently using the occupancy calculator, but this question just popped out.

Hm, yeah, at first glance it sounds like a good idea. That particular formula probably isn’t the 100% best arbiter of block sizes, but something along those lines oughtta work.

Ok… Then I’ll get to it now.

Major problem for that is, that you need your register usage as an input, which you usually only get at compile time, but you would probably want it as a constant in your code. Now, one could put it in some resource file generated during the build process and read at runtime, but that’s kind of ugly. Is there any way to get the register requirements of a kernel at runtime?

you can get register usage, etc. via API calls as of 2.2. I forget what they are, but it’s in the execution configuration section of the reference manual, I think.

Thanks, that was fast. I got to admit, I only crossred the guide for changes and that one I obviously overlooked, though I have been waiting for it for quite a while. ;)

To anybody else looking for it: It is contained in the structure cudaFuncAttributes which you can recieve via cudaFuncGetAttributes and is explained in section 3.7 Execution Control.