Automated grid/block calculation in C++ for max occupancy


Nvidia supplied an occupancy calculator speadsheet. Did anyone ever consider translating this into C++ code? Then you can call a function to get the grid/block sizes?
I was imagining a system that would automate the process, by recording the time taken and occupancy and fine tuning the grid and block sizes.

Is it legal to do? (spreadsheet has a copyright)

I perform representative benchmarks of each kernel at all possible block sizes and pick the fastest as the tuned block size. And the fastest block size is not always the one with the highest occupancy.